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THERAPEUTIC AMD DIAGNOSTIC AGENTS CAPABLE OF MODULATING CELLULAR RESPONSIVENESS TO 
CYTOKINES 



FIELD OF THE INVENTION 

5 The present invention relates generally to therapeutic and diagnostic agents. More particularly, 
the present invention provides therapeutic molecules capable of modulating signal transduction 
such as but not limited to cytokine-mediated signal transduction. The molecules of the present 
invention are useful, therefore, in modulating cellular responsiveness to cytokines as well as other 
mediators of signal transduction such as endogenous or exogenous molecules, antigens, microbes 
10 and microbial products, viruses or components thereof, ions, hormones and parasites. 

Bibliographic details of the publications referred to in this specification by author are collected 
at the end of the description. Sequence Identity Numbers (SEQ ID NOs.) for the nucleotide and 
amino acid sequences referred to in the specification are defined after the bibliography. A 
15 summary of the SEQ ID NOs is given in Table 1* 

Throughout this specification and the claims which follow, unless the context requires otherwise, 
the word "comprise", or variations such as "comprises" or "comprising", will be understood to 
imply the inclusion of a stated integer or group of integers but not the exclusion of any other 
20 integer or group of integers. 

BACKGROUND OF THE INVENTION 

Cells continually monitor their environment in order to modulate physiological and biochemieal 
25 processes which in turn affects future behaviour. Frequently, a cell's initial interaction with its 
surroundings occurs via receptors expressed on the plasma membrane. Activation of these 
receptors, whether through binding endogenous ligands (such as cytokines) or exogenous ligands 
(such as antigens), triggers a biochemical cascade from the membrane through the cytoplasm to 
the nucleus. 



Of the endogenous ligands, cytokines represent a particularly important and versatile group. 



30 
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Cytokines are proteins which regulate the survival, proliferation, differentiation and function of 
a variety of cells within the body [Nicola, 1994]. The haemopoietic cytokines have in common 
a four-alpha helical bundle structure and the vast majority interact with a structurally related 
family of cell surface receptors, the type I and type II cytokine receptors [Bazan, 1990; Sprang, 
5 1993]. In all cases, ligand-induced receptor aggregation appears to be a critical event in initiating 
intracellular signal transduction cascades. Some cytokines, for example growth hormone, 
erythropoietin (Epo) and granulocyte-colony-stimulating factor (G-CSF), trigger receptor 
homodimerisation, while for other cytokines, receptor heterodimerisation or heterotrimerisation 
is crucial. In the latter cases, several cytokines share common receptor subunits and on this basis 

10 can be grouped into three subfamilies with similar patterns of intracellular activation and similar 
biological effects [Hilton, 1994]. Interleukin-3 (IL-3), IL-5 and granulocyte-macrophage colony- 
stimulating factor (GM-CSF) use the common p-receptor subunit (pc) and each cytokine 
stimulates the production and functional activity of granulocytes and macrophages. IL-2, IL-4, 
IL-7, IL-9, and IL-15 each use the common y-chain (yc), while IL-4 and IL-13 share an 

1 5 alternative y-chain (y x c or IL- 13 receptor ce-chain). Each of these cytokines plays an important 
role in regulating acquired immunity in the lymphoid system. Finally, EL-6, BL-1 1, leukaemia 
inhibitory factor (LIF), oncostatin-M (OSM), ciliary neurotrophic factor (CNTF) and 
cardiotrophin (CT) share the receptor subunit gpl30. Each of these cytokines appears to be 
highly pleiotropic, having effects both within and outside the haemopoietic system [Nicola, 
20 1994]. 

In all of the above cases at least one subunit of each receptor complex contains the conserved 
sequence elements, termed boxl and box2, in their cytoplasmic tails [Murakami, 1991]. Boxl 
is a proline-rich motif which is located more proximal to the transmembrane domain than the - 

25 acidic box 2 element. The box-1 region serves as the binding site for a class of cytoplasmic 
tyrosine kinases termed JAKs (Janus kinases). Ligand-induced receptor dimerisation serves to 
increase the catalytic activity of the associated JAKs through cross-phosphorylation. Activated 
JAKs then tyrosine phosphorylate several substrates, including the receptors themselves. 
Specific phosphotyrosine residues on the receptor then serve as docking sites for SH2-containing 

30 proteins, the best characterised of which are the signal transducers and activators of transcription 
(STATs) and the adaptor protein, she. The STATs are then phosphorylated on tyrosines, 
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probably by JAKs, dissociate from the receptor and form either homodimers or heterodimers 
through the interaction of the SH2 domain of one STAT with the phosphotyrosine residue of the 
other. STAT dimers then translocate to the nucleus where they bind to specific cytokine- 
responsive promoters and activate transcription [Darnell, 1994; Ihle, 1995; Ihle, 1995]. In a 

5, separate pathway, tyrosine phosphorylated she interacts with another SH2 domain-containing 
protein, Grb-2, leading ultimately to activation of members of the MAP kinase family and in turn 
transcription factors such as fos and jun [Sato, 1993; Cutler, 1993]. These pathways are not 
unique to members of the cytokine receptor family since cytokines that bind receptor tyrosine 
kinases also being able to activate STATs and members of the MAP kinase family [David, 1996; 

10 Leaman, 1996; Shual* 1993; Sato, 1993; Cutler, 1993]. 

Four members of the JAK family of cytoplasmic tyrosine kinases have been described, JAK1 , 
JAK2, : JAK3 and TYK2, each of which binds to a specific subset of cytokine receptor subunits. 
Six STATs have been described (STAT1 through STAT6), and these too are activated by 
15 distinct cytokine/receptor complexes. For example, STAT1 appears to be functionally specific 
to the interferon system, STAT4 appears to be specific to IL-12, while STAT6 appears to be 
specific for IL-4 and IL-13. Thus, despite common activation mechanisms some degree of 
cytokine specificity may be achieved through the use of specific JAKs and STATs [Thierfelder, 
1996; Kaplan, 1996; Takeda, 1996; Shimoda, 1996; Meraz, 1996; Durbin, 1996], 

20 

In addition to those described above, there are clearly other mechanisms of activation of these 
pathways. For example, the JAK/STAT pathway appears to be able to activate MAP kinases 
independent of the she-induced pathway [David, 1995] and the STATs themselves can be 
activated without binding to the receptor, possibly by direct interaction with JAKs [Gupta, 
25 1996]. Conversely, full activation of STATS may require the action of MAP kinase in addition 
to that of JAKs [David, 1995; Wen, 1995]. 

While the activation of these signalling pathways is becoming better understood, little is known 
of the regulation of these pathways, including employment of negative or positive feedback 
30 loops. This is important since once a cell has begun to respond to a stimulus, it is critical that 
the intensity and duration of the response is regulated and that signal transduction is switched 

SUBSTITUTE SHEET (Rule 26; 



09:56:51 



• 



WO 98/20023 PCT/AU97/00729 



off. It is likewise desirable to increase the intensity of a response systemically or even locally as 
the situation requires. 



In work leading up to the present invention, the inventors sought to isolate negative regulators 
5 of signal transduction. The inventors have now identified a new family of proteins which are 
capable of acting as regulators of signalling. The new family of proteins is defined as the 
suppressor of cytokine signalling (SOCS) family based on the ability of the initially identified 
SOCS molecules to suppress cytokine-mediated signalling. It should be noted, however, that 
not all members of the SOCS family need necessarily share suppressor function nor target solely 

10 cytokine mediated signalling. The SOCS family comprises at least three classes of protein 
molecules based on amino acid sequence motifs located N-terminal of a C-terminal motif called 
the SOCS box. The identification of this new family of regulatory molecules permits the 
generation of a range of effector or modulator molecules capable of modulating signal 
transduction and, hence, cellular responsiveness to a range of molecules including cytokines. 

15 The present invention, therefore, provides therapeutic and diagnostic agents based on SOCS 
proteins, derivatives, homologues, analogues and mimetics thereof as well as agonists and 
antagonists of SOCS proteins. 

. SUMMARY OF THE INVENTION 

20 

The present invention provides inter alia nucleic acid molecules encoding members of the SOCS 
family of proteins as well as the proteins themselves. Reference hereinafter to "SOCS" 
encompasses any or all members of the SOCS family. Specific SOCS molecules are defined 
numerically such as, for example, SOCS1, SOCS2 and SOCS3. The species from which the 

25 SOCS has been obtained may be indicated by a preface of a single letter abbreviation where "h" 
is human, "m" is murine and V is rat. Accordingly, "mSOCS 1 "is a specific SOCS from a murine 
animal. Reference herein to "SOCS" is not to imply that the protein solely suppresses 
cytokine-mediated signal transduction, as the molecule may modulate other effector-mediated 
signal transductions such as by hormones or other endogenous or exogenous molecules, 

30 antigens, microbes and microbial products, viruses or components thereof, ions, hormones and 
parasites. The term "modulates" encompasses up-regulation, down-regulation as well as 

SUBSTITUTE SHEET (Rule 26) 



09:56:51 




PCT/AU97/00729 



-5- 

maintenance of particular levels. 

One aspect of the present invention provides a nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence encoding a protein or a derivative, 
5 homologue, analogue or mimetic thereof or a nucleotide sequence capable of hybridizing thereto 
under low stringency conditions at 42°C wherein said protein comprises a SOCS box in its C- 
terminal region 

Another aspect of the present invention provides a nucleic acid molecule comprising a sequence 
10 of nucleotides encoding or complementary to a sequence encoding a protein or a derivative, 
homologue, analogue or mimetic thereof or a nucleotide sequence capable of hybridizing thereto 
under low stringency conditions at 42°C wherein said protein comprises a SOCS box in its C- 
terminal region and a protein:molecule interacting region. 

15 Yet another aspect of the present invention is directed to a nucleic acid molecule comprising a 
sequence of nucleotides encoding or complementary to a sequence encoding a protein or a 
derivative, homologue, analogue or mimetic thereof or a nucleotide sequence capable of 
hybridizing thereto under low stringency conditions at 42°C wherein said protein comprises a C- 
. terminal region and a protein:molecule interacting region located in a region N-terminal of the 

20 SOCS box. 

Preferably, the proteinrmolecule interacting region is a protein:DNA or proteinrprotein binding 
region. 

25 Still a further aspect of the present invention provides a nucleic acid molecule comprising a 
sequence of nucleotides encoding or complementary to a sequence encoding a protein or a 
derivative, homologue, analogue or mimetic thereof or a nucleotide sequence capable of 
hybridizing thereto under low stringency conditions at 42°C wherein said protein comprises a 
SOCS box in its C-terminal region and one or more of an SH2 domain, WD-40 repeats or 

30 ankyrin repeats N-terminal of the SOCS box. 
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Even still a further aspect of the present invention is directed to a nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding a 
protein or a derivative, homologue, analogue or mimetic thereof or a nucleotide sequence 
capable of hybridizing thereto under low stringency conditions at 42°C wherein said protein 
5 comprises a SOCS box in its C-terminal region wherein the SOCS box comprises the amino acid 
sequence: 

X, X 2 X 3 X< X 5 X« X 7 X g X, X I0 X n X„ X,3 X 14 X 15 X 16 [XJ n x 17 X IS x 19 x 20 

X 2 , X22 X23 [Xj] n X 24 X^ Xjg X 27 X28 

10 

wherein: X! is L, I, V, M, A or P; 

X 2 is any amino acid residue; 

X 3 isP,TorS; 

X 4 is L, I, V, M, A or P; 
15 X 3 is any amino acid; 

Xfi is any amino acid; 

X 7 is L, I, V, M, A, F, Y or W; 

X 8 is C, T or S; 

X^sR^KorH; 
20 X 10 is any amino acid; 

X n is any amino acid; 

X I2 is L, I, V, M, AorP; 

X !3 is any amino acid; 

X 14 is any amino acid; 
25 X, 5 is any amino acid; 

X 16 is L, I. V, M ( A, P, G, C, T or S; 

tXJ n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
30 X I7 isL,I, V,M t AorP; 

. X I8 is any amino acid; 
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X 19 is any amino acid; 
X M L, I, V, M, AorP; 
X 21 is P; 

X^isUI, V t M, A.PorG; 
5 X^ is P or N; 

\X£ a is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X2 4 isL,I, V,M, AorP; 
10 X^ is any amino acid; 

X25 is any amino acid; 

X^isYorF; 

X^ is L, I, V, M, A orP; 

15 and a protein:molecule interacting region such as but not limited to one or more of an SH2 
domain, WD-40 repeats and/or ankyrin repeats N-tenninal of the SOCS box. 

Another aspect of the present invention is directed to a nucleic acid molecule comprising a 
. sequence of nucleotides encoding or complementary to a sequence encoding a protein or a 
20 derivative, homologue, analogue or mimetic thereof or a nucleotide sequence capable of 
hybridizing thereto under low stringency conditions at 42°C wherein said protein exhibits the 
following characteristics: 

(i) comprises a SOCS box in its C-terminal region having the amino acid sequence: 
25 X| X 2 X 3 X 4 X5 Xg X 7 Xg X$ Xi 0 X^ X !2 X 13 X M X )5 X 16 X17 X, g X, 9 X20 

^21 ^22 ^23 P^jln ^24 ^25 ^26 ^27^28 

wherein: X, is L, I, V, M, A or P; 

X 2 is any amino acid residue; 
30 X 3 is P, T or S; 

X4 is L, I, V f M, A or P; 
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X 5 is any amino acid; 

X 6 is any amino acid; 

X 7 is L, I, V, M, A, F, Y or W; 

X 8 is C, TorS; 

X^s R, K or H; 

X 10 is any amino acid; 

X„ is any amino acid; 

X 12 is L, I, V, M, A or P; 

X 13 is any amino acid; 

X 14 is any amino acid; 

X 15 is any amino acid; 

X 16 is L, I, V, M, A, P, G, C, T or S; 

[XJ n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X 17 is L,I, V # M, AorP; 
X I8 is any amino acid; 
X, 9 is any amino acid; 
X^L, I, V, M, A or P; 
X 2I isP; 

X^is L, I, V, M, A.PorG; 
X^isPorN; 

[XjL is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence X } may comprise the same or different amino 

acids selected from any amino acid residue; 

X 24 isL,I,V,M, AorP; 

Xjj is any amino acid; 

X 26 is any amino acid; 

X?, is Yor F; 

X^ is L, I, V, M, A or P; and 
comprises at least one of a SH2 domain, WD-40 repeats and/or ankyrin repeats or other 
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proteimmolecule interacting domain in a region N-terminal of the SOCS box. 

Preferably, the SOCS molecules modulate signal transduction such as from a cytokine or 
hormone or other endogenous or exogenous molecule, a microbe or microbial product an 
5 antigen or a parasite. 

More preferably, the SOCS molecule modulate cytokine mediated signal transduction. 

Still another aspect of the present invention comprises a nucleic acid molecule comprising a 
10 sequence of nucleotides encoding or complementary to a sequence encoding a protein or a 
derivative, homologue, analogue or mimetic thereof or comprises a nucleotide sequence capable 
of hybridizing thereto under low stringency conditions at 42°C wherein said protein exhibits the 
following characteristics; 

(i) is capable of modulating signal transduction; 
15 (ii) comprises a SOCS box in its C-terminal region having the amino acid sequence: 

X, X 2 X 3 X4 X 5 Xg X 7 X 8 X9 X| 0 X M X I2 X !3 X U X| S X !6 CXJ n X, 7 X 18 X 19 Xjo 

^21 ^22 ^23 PQ]„ ^24 ^25 ^26 ^27^28 

20 wherein: X, is L, I, V, M, A or P; 

X 2 is any amino acid residue; 

X 3 is P, T or S; 

X^isUI, V, M,AorP; 

X s is any amino acid; 
25 X 6 is any amino acid; 

X 7 is L, I, V, M, A, F, Y or W; 

X 8 is C,TorS; 

X 9 isR,KorH; 

X 10 is any amino acid; 
30 X H is any amino acid; 

X l2 isL,I, V,M, AorP; 
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X 13 is any amino acid; 
X 14 is any amino acid; 
X, 5 is any amino acid; 
X l6 is L, I, V, M, A, P, G, C, T or S; 
5 py n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X 17 isL,I, V,M, AorP; 
X l8 is any amino acid; 
10 X| 9 is any amino acid; 

X M L, I, V, M, AorP; 
X 2I is P; 

X M is L, I, V, M, A, P or G; 
X^isPorN; 

15 [Xjln is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence X^ may comprise the same or different amino 

acids selected from any amino acid residue; 

X M is L, I, V, M, AorP; 

X^ is any amino acid; 
20 X 26 is any amino acid; 

X^is Yor F; 

X n is L, I, V, M, A or P; and 

(iii) comprises at least one of a SH2 domain, WD-40 repeats and/or ankyrin repeats or other 
25 protein:molecule interacting domain in a region N-terminal of the SOCS box. 



Preferably, the signal transduction is mediated by a cytokine such as one or more of EPO, TPO, 
G-CSF, GM-CSF, IL-3, IL-2, IL-4, IL-7, IL-13, IL-6, LIF, IL-12, IFNcc, TNFa, IL-1 and/or 
M-CSF. 

30 

Preferably, the signal transduction is mediated by one or more of Interleukin 6 (IL-6), Leukaemia 
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Inhibitory Factor (LIF), Oncostatin M (OSM), Interferon (IFN)-ct and/or thrombopoietin. 

Preferably, the signal transduction is mediated by IL-6. 

5 Particularly preferred nucleic acid molecules comprise nucleotide sequences substantially set 
forth in SEQ ID NO:3 (mSOCSl), SEQ ID NO:5 (mSOCS2), SEQ ID NO:7 (mSOCS3), SEQ 
ID NO:9 (hSOCSl), SEQ ID NO:ll (rSOCSl), SEQ ID NO:13 (mSOCS4), SEQ ID NO:15 
and SEQ ID NO: 1 6 (hSOCS4), SEQ ID NO: 17 (mSOCSS), SEQ ED NO: 19 (hSOCSS), SEQ 
ID NO:20 (mSOCS6), SEQ ID NO:22 and SEQ ID NO:23 (hSOCS6), SEQ ID NO:24 

10 (mSOCS7), SEQ ID NO:26 and SEQ ID NO:27 (hSOCS7), SEQ ID NO:28 (mSOCS8), SEQ 
ID NCfc30 (mSOCS9), SEQ ID NO:31 (hSOCS9), SEQ ID NO:32 (mSOCSlO), SEQ ID NO:33 
and SEQ ID NO:34 (hSOCSlO), SEQ ID NO:35 (hSOCSll), SEQ ID NO:37 (mSOCS12), 
SEQ ID NO:38 and SEQ ID NO:39 (hSOCS 12), SEQ ID NO:40 (mSOCS 1 3), SEQ ID NO:42 
(hSOCS13), SEQ ID NO: 43 (mSOCS14), SEQ ID NO:45 (mSOCSIS) and SEQ ID NO:47 

15 (hSOCSIS) or a nucleotide sequence having at least about 15% similarity to all or a region of 
any of the listed sequences or a nucleotide acid molecule capable of hybridizing to any one of the 
listed sequences under low stringency conditions at 42°C. 

. Another aspect of the present invention relates to a protein or a derivative, homologue, analogue 
20 or mimetic thereof comprising a SOCS box in its C-terminal region. 

Yet another aspect of the present invention is directed to a protein or a derivative, homologue, 
analogue or mimetic thereof comprising a SOCS box in its C-terminal region and a 
proteinrmolecule interacting region. 

25 

Even yet another aspect of the present invention provides a protein or a derivative, homologue, 
analogue or mimetic thereof comprising an interacting region located in a region N-terminal of 
the SOCS box. 

30 Preferably, the protein: molecule interacting region is a proteimDNA or a proteinrprotein binding 
region. 
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Another aspect of the present invention contemplates a protein or a derivative, homologue, 
analogue or mimetic thereof comprising a SOCS box in its C-terminal region and a SH2 domain, 
WD-40 repeats or ankyrin repeats N-terminal of the SOCS box. 

5 Still yet another aspect of the present invention provides a protein or a derivative, homologue, 
analogue or mimetic thereof exhibiting the following characteristics: 

(i) : comprises a SOCS box in its C-terminal region having the amino acid sequence; 

10 X| X 2 X 3 X 4 X 5 Xg X 7 X 8 X9 Xi 0 X n X 12 X, 3 X U X 15 X 16 [XJ n X J7 X 18 Xj 9 X20 

X21 X22 X23 [Xj] n X 24 X25 X 26 X 27 X 28 

wherein: X, is L, I, V, M, A or P; 

X 2 is any amino acid residue; 
15 X 3 is P, T or S; 

X^s L, I, V, M,AorP; 

X 5 is any amino acid; 

Xf, is any amino acid; 

X 7 is L, I, V, M, A, F, Y or W; 
20 X 8 isC,TorS; 

X, is R, K or H; 

X 10 is any amino acid; 

X n is any amino acid; 

X 12 is L, I, V, M, A or P; 
25 X l3 is any amino acid; 

X 14 is any amino acid; 

X l5 is any amino acid; 

X 16 is L, I, V, M, A, P, G, C, T or S; 

[XJ n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
30 and wherein the sequence X; may comprise the same or different amino 

acids selected from any amino acid residue; 
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X I7 isL,I, V,M,AorP; 
X 18 is any amino acid; 
X, 9 is any amino acid; 
X^L, I, V, M.AorP; 
is P; 

X^isL, I, V, M, A,PorG; 
Xjj is P or N; 

[XJ 0 is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X M is L, I, V,M, A or P; 
Xj5 is any amino acid; 

is any amino acid; 
X^is YorF; 

Xjg is L, I, V, M, A or P; and 

(ii) comprises at least one of a SH2 domain, WD-40 repeats and/or ankyrin repeats or other 
protein:molecule interacting domain in a region N-tenninai of the SOCS box. 

20 Preferably, the proteins modulate signal transduction such as cytokine-mediated signal 
transduction. 

Preferred cytokines are EPO, TPO, G-CSF, GM-CSF, IL-3, IL-2, ILr4, IL-7, IL-13, IL-6, LIF, 
EL- 12, IFNy, TNFa, IL-1 and/or M-CSF. 

25 

A particularly preferred cytokine is IL-6. 

Even yet another aspect of the present invention provides a protein or derivative, homologue, 
analogue or mimetic thereof exhibiting the following characteristics: 
30 (i) is capable of modulating signal transduction such as cytokine-mediated signal 
transduction; 
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(ii) comprises a SOCS box in its C-terminal region having the amino acid sequence: 
Xj X 2 X 3 X 4 X 5 X$ X 7 X 8 X 9 Xj 0 X n X 12 X l3 X u Xj 5 X 16 [Xj] n X 17 X, 8 X 19 X20 

X 2 i X22 X 23 [Xj] n X 24 X25 X 2 6 X 27 X2g 

5 

wherein: X! is L, I, V, M, A or P; 

X 2 is any amino acid residue; 

X 3 is P, T or S; 

X 4 isL,I, V,M, AorP; 
10 X 5 is any amino acid; ^ 

Xg is any amino acid; 

X 7 is L, I, V, M, A, F, Y or W; 

X 8 is C, T or S; 

X^s R, K or H; 
15 X 10 is any amino acid; 

X u is any amino acid; 

X I2 is L, I, V, M, A or P; 

X 13 is any amino acid; 

X H is any amino acid; 
20 X 15 is any amino acid; 

X 16 is L, I, V, M, A, P, G, C, T or S; 

[XJ n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
25 X 17 is L, I, V, M, A or P; 

X 18 is any amino acid; 
X, 9 is any amino acid; 
X^L, I, V, M,AorP; 
X 2I is P; 

30 X^isL,^ V,M, A,PorG; 

X^isPorN; 
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[X^ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X w is L, I, V, M, A or P; 
5 X^ is any amino acid; 

is any amino acid; 
X„ is Y or F; 

X28 is L, I, V, M, A or P; and 

10 (iii) comprises at least one of a SH2 domain, WD-40 repeats and/or ankyrin repeats or other 
protein-molecule interacting domain in a region N-terminal of the SOCS box. 

Particularly preferred SOCS proteins comprise an amino acid sequence substantially as set forth 
in SEQ ID NO:4 (mSOCSl), SEQ ID NO:6 (mSOCS2)„ SEQ ID NO:8 (mSOCS3), SEQ ID 
15 NO:10 (hSOCSl), SEQ ID NO:12 (rSOCSl), SEQ ID NO:14 (mSOCS4) t SEQ ID NO:18 
(mSOCS5), SEQ ID NO:21 (mSOCS6) t SEQ ID NO:25 (mSOCS7), SEQ ID NO:29 
(mSOCS8), SEQ ID NO:36 (hSOCSll), SEQ ID NO:41 (mSOCS13), SEQ ID NO:44 
(mSOCS14), SEQ ID NO:46 (mSOCSIS) and SEQ ID NO:48 (hSOCSIS) or an amino acid 
. sequence having at least 15% similarity to all or a region of any one of the listed sequences. 

20 

Another aspect of the present invention contemplates a method of modulating levels of a SOCS 
protein in a cell said method comprising contacting a cell containing a SOCS gene with an 
effective amount of a modulator of SOCS gene expression or SOCS protein activity for a time 
and under conditions sufficient to modulate levels of said SOCS protein. 

25 

A related aspect of the present invention provides a method of modulating signal transduction 
in a cell containing a SOCS gene comprising contacting said cell with an effective amount of a 
modulator of SOCS gene expression or SOCS protein activity for a time sufficient to modulate 
signal transduction. 

30 

Yet a further related aspect of the present invention is directed to a method of influencing 
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interactkm between cells wherein at least one cell carries a SOCS gene, said method comprising 
contacting the cell carrying the SOCS gene with an effective amount of a modulator of SOCS 
gene expression or SOCS protein activity for a time sufficient to modulate signal transduction. 

5 In accordance with the present invention, n in [XJ n and [X^ may, in addition from being 1-50, 
be from 1-30, 1-20, 1-10 and 1-5. 

A summary of the SEQ ID NOs referred to in the subject specification is given in Table 1. 
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TABLE 1 

SUMMARY OF SEQUENCE IDENTITY NUMBERS 

SEQUENCE SEQ ID NO. 

PCR Primer 1 

PCR Primer 2 

Mouse SOCS1 (nucleotide) 3 

Mouse SOCS1 (amino acid) 4 

Mouse SOCS2 (nucleotide) 5 

Mouse SOCS2 (amino acid) 6 

Mouse SOCS3 (nucleotide) 7 

Mouse SOCS3 (amino acid) 8 

Human SOCS1 (nucleotide) 9 

Human SOCS1 (amino acid) 10 

Rat SOCS 1 (nucleotide) 1 1 

Rat SOCS 1 (amino acid) 1 2 

nucleotide sequence of murine SOCS4 13 

amino acid sequence of murine SOCS4 14 

nucleotide sequence of SOCS4 cDN A human contig 4.1 15 

nucleotide sequence of SOCS4 cDNA human contig 4.2 1 6 

nucleotide sequence of murine SOCS5 17 

amino acid sequence of murine SOCS5 18 

nucleotide sequence of human SOCS5 19 

nucleotide sequence of murine SOCS6 20 

amino acid of murine SOCS6 21 

nucleotide sequence of human SOCS6 contig h6. 1 22 

nucleotide sequence of human SOCS6 contig h6.2 23 

nucleotide sequence of murine SOCS7 24 
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amino acid sequence of munne SOCS7 


25 


nucleotide sequence of human SOCS7 contig h7.1 


26 


nucleotide sequence of human SQCS1 contig 17.2 


27 


nucleotide sequence of murine SOCS8 


28 


amino acid sequence of murine SOCS 8 


29 


nucleotide sequence of murine SOCS9 


30 


nucleotide sequence of human SOCS9 


31 


nucleotide sequence of murine SOCS 10 


32 


nucleotide sequence of human SOCS 10 contig hlO.l 


33 


nucleotide sequence of human SOCS10 contig hl0.2 


34 


nucleotide sequence of human SOCS 1 1 


35 


amino acid sequence of human SOCS 1 1 


36 


nucleotide sequence of mouse SOCS 12 


37 


nucleotide sequence of human SOCS 12 contig hl2.1 


38 


nucleotide sequence of human SOCS12 contig hl2.2 


39 


nucleotide sequence of murine SOCS 13 


40 


amino acid sequence of murine SOCS 13 


41 


nucleotide sequence of human SOCS 13 cDNA contig hi 3.1 


42 


nucleotide sequence of murine SOCS 14 cDNA 


43 


amino acid sequence of murine SOCS 14 


44 


nucleotide sequence of murine SOCS 15 cDNA 


45 


amino acid sequence of murine SOCS 15 


46 


nucleotide sequence of human SOCS 15 


47 


amino acid sequence of human SOCS 15 


48 



25 
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S ingle and three letter abbreviations are used to denote amino acid residues and these are 
summarized in Table 2. 



TABLE 2 



Amino Acid Three-letter One-letter 

Abbreviation Symbol 



Alanine 


Ala 


A 


10 Arginine 


Arg 


R 


Asparagine 


Asn 


N 


Aspartic acid 


Asp 


D 


Cysteine 


Cys 


C 


Glutamine 


Gin 


Q 


15 Glutamic acid 


Glu 


E 


Glycine 


Gly 


G 


Histidine 


His 


H 


Isoleucine 


lie 


I 


. Leucine 


Leu 


L 


20 Lysine 


Lys 


K 


Methionine 


Met 


M 


Phenylalanine 


Phe 


F 


Proline 


Pro 


P 


Serine 


Ser 


S 


25 Threonine 


Thr 


T 


Tryptophan 


Tip 


W 


Tyrosine 


Tyr 


Y 


Valine 


Val 


V 


Any residue 


Xaa 


X 



30 
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BRIEF DESCRIPTION OF THE DRAWINGS 



In some of the Figures, abbreviations are used to denote SOCS proteins with certain binding 
motifs. SOCS proteins which contain WD-40 repeats are referred to as WSB1-WSB4. SOCS 
5 proteins with ankyrin repeats are referred to as ASB 1-ASB3. 

Figure 1 is a diagrammatic representation showing generation of an IL-6-unresponsive Ml clone 
by retroviral infection. The RUFneo retrovirus, showing the position of landmark restriction 
endonuclease cleavage sites, the 4A2 cDNA insert and the position of PCR primer sequences. 

10 

Figure 2 is a photographic representation of Southern and Northern analysis. (Left and Middle 
Panels) Southern blot analysis of genomic DNA from clone 4A2 and a control infected Ml clone. 
DNA was digested with BamH I, to reveal the number of retroviruses carried by each clone, and 
Sac I, to estimate the size of the retroviral cDNA insert Left panel; probed with neo. Right 
15 panel; probed with the Xho I-digested 4A2 PCR product. (Right Panel) . Northern blot analysis 
of total RNA from clone 4A2 and a control infected Ml clone, probed with the Xho I-digested 
4A2 PCR product. The two bands represent unspliced and spliced retroviral transcripts, 
resulting from splice donor and acceptor sites in the retroviral genome. 

20 Figure 3 is a representation of the nucleotide sequence and structure of the SOCS1 gene. A. 
The genomic context of SOCS 1 in relation to the protamine gene cluster on murine chromosome 
16. The accession number of this locus is MMPRMGNS (direct submission; G. Schlueter, 1995) 
for the mouse and BTPRMTNP2 for the rat (direct submission; G. Schlueter, 1996). B. The 
nucleotide sequence of the SOCS1 cDNA and deduced amino acid sequence. Conventional one 

25 letter abbreviations are used for the amino acid sequence and the asterisk indicates the stop 
codon. The polyadenylation signal sequence is underlined. The coding region is shown in 
uppercase and the untranslated region is shown in lower case. 



Figure 4 is a graphical representation of cell differentiation in the presence of cytokines. Semi- 
30 solid agar cultures of parentalMl cells (Ml and MLmpl) and Ml cells expressing SOCS1 (4A2 
and Ml.mpLSOCSl), were used and the percentage of colonies which differentiated in response 
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to a titration of 1 mg/ml JLr6 (•), 100 ng/ml LIF (0), 1 mg/ml OSM (□), 100 ng/ml IFN-y (*), 
500 ng/ml TPO (• ), or 3X10" 6 M dexamethasone (#) determined. 

Figure 5 is a photographic representation of cytospins of liquid cultures of parental Ml cells 
5 (Ml and MLmpl) ) and Ml cells expressing SOCS 1 (4A2 and Ml jnpLSOCS 1) cultured for 4 
days in the presence of 10 ng/ml IL-6 or saline. Unlike parental Ml cells, morphological features 
consistent with macrophage differentiation are not observed in Ml cells constitutively expressing 
SOCS 1 (4A2 and MLmpl.SOCSl) when cultured in EL-6. 

10 Figure 6 is a photographic representation showing inhibition of phosphorylation of signalling 
molecules by SOCS1. Parental Ml cells (Ml and MLmpl) and Ml cells expressing SOCS1 
(4A2 and MLmpLSOCS 1) were incubated in the absence (-) or presence (+) of 1 0 ng/ml of IL-6 
for 4 minutes at 37°C . Cells were then lysed and extracts were either immunopreciptated using 
anti-mouse gp!30 antibody prior to SDS-PAGE (two upper panels) or were electrophoresed 

15 directly (two lower panels). Gels were blotted and the filters were then probed with anti- 
phosphotyrosine (upper panel), anti-gpl30 antibody (second top panel), anti-phospho-STAT3 
(second bottom panel) or anti-STAT3 (lower panel). Blots were visualised using peroxidase- 
conjugated secondary antibodies and Enhanced Chemiluminescence (ECL) reagents. 

20 Figure 7 is a representation of protein extracts prepared from (A) Ml cells or Ml cells 
expressing SOCS1 (4A2) and (B) MLmpl cells or Ml jnpLSOCSl cells incubated for 10 min 
at 37°C in 10 ml serum-free DME containing either saline, 100 ng/ml DL-6 or 100 ng/ml IFN-y. 
The binding reactions contained 4-6 \ig protein (constant within a given experiment), 5 ng 32 P- 
labelled m67 oligonucleotide encoding the high affinity SBF (c-sis- inducible factor) binding site, 

25 and 800 ng sonicated salmon sperm DNA. For certain experiments, protein samples were 
preincubated with an excess of unlabelled m67 oligonucleotide, or antibodies specific for either 
STAT1 or STAT3. 

Figure 8 is a photographic representation of Northern hybridisation. Mice were injected 
30 intravenously with 2 /zg and after various periods of time, the livers were removed and polyA+ 
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mRNA was purified. Ml cells were stimulated for various lengths of time with 500 ng/ml of IL- 
6, after which polyA+ mRNA was isolated. mRNA was fractionated by electrophoresis and 
immobilized on nylon filters. Northern blots were prehybridized, hybridized with random-primed 
32 P-labelled SOCS1 or GAPDH DNA fragments, washed and exposed to film overnight. 

5 

Figure 9 is a representation of a comparison of the amino acid sequences of SOCS1, SOCS2, 
SOCS3 and CIS. Alignment of the predicted amino acid sequence of mouse (mm), human (hs) 
and rat (rr) SOCS1, SOCS2, SOCS3 and CIS. Those residues shaded are conserved in three or 
four mouse SOCS family members. The SH2 domain is boxed in solid lines, while the SOCS box 
10 is bounded by double lines. 



Figure 10 is a photographic representation showing the phenotype of BL-6 unresponsive Ml cell 
clone, 4A2. Colonies of parental Ml cells (left panel) and clone 4A2 (right panel) cultured in 
semi-solid agar for 7 days in saline or 100 ng/ml IL-6. 

15 

Figure 11 is a photographic representation showing expression of mRNA for SOCS family 
members in vitro and in vivo. 

(A) Northern analysis of mRNA from a range of mouse organs showing constitutive 
. expression of SOCS family members in a limited number of tissues. 
20 (B) Norther analysis of mRNA from liver and Ml cells showing induction of expression of 
SOCS family members following exposure to IL-6. 

(C) Reverse transcriptase PCR analysis of mRNA from bone marrow showing induction of 
expression of SOCS family members by a range of cytokines. 



25 Figure 12 is a photographic representation showing SOCS 1 suppresses the phosphorylation and 
activation of gpl30 and STAT-3. 

(A) Western blots of extracts from parental Ml cells (Ml and Ml.mpl) and Ml cells 
expressing SOCS1 (4A2 and Ml.mpLSOCSl) stimulated with (+) or without (-) 100 ng/ml IL-6. 
Top: Extracts immunoprecipitated with antu-gpl30 (ccgpl30) and immunoblotted with anti- 
30 phosphotyrosine (aPY-STAT3), or for STAT3 (aSTAT3) to demonstrate equal loading of 
protein. The molecular weights of the bands are shown on the right. 
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(B) , EMSA of MLmpl and Ml.mpl.SOCS! cells stimulated with (+) and without (-) 100 
ng/tnl JLr6 or 100 ng/ml IFNy. The DNA-binding complexes SIF A, B, and C are indicated at 
the left. 

5 Figure 13 is a representation of a comparison of the amino acid sequence of the SOCS proteins 
(A) Schematic representation of structures of SOCS proteins including proteins which contain 
WD-40 repeats (WSB) and ankyrin repeats (ASB). (B) Alignment of N-terminal regions of 
SOCS proteins. (Q Alignment of the SH2 domains of CIS, SOCS1, 2, 3, 5, 9, 11 and 14. (D) 
Alignment of the WD-40 repeats of SOCS4, SOCS6, SOCS13 and SOCS15. (E) Alignment of 

10 the ankyrin repeats of SOCS7 and SOCS 10. (F) Alignment of the regions between SH2, WD-40 
and ankyrin repeats and the SOCS box. (G) Alignment of the SOCS box. In each case the 
conventional one letter abbreviations for amino acids are used, with X denoting residues of 
uncertain identity and OOO denoting the beginning and the end of contigs. Amino acid 
sequence obtained from conceptual translation of nucleic acid sequence derived from isolated 

15 cDNAs is shown in upper case while amino acid sequence obtained by conceptual translation of 
ESTs is shown in lower case and is approximate only. Conserved residues, defined as (LIVMA), 
(FYW), (DE)> (QN). (C, S, T), (KRH), (PG) are shaded in the SH2 domain, WD-40 repeats, 
ankyrin repeats and the SOCS box. For the alignment of SH2 domains, WD-40 repeats and 
ankyrin repeats a consensus sequence is shown above. In each case this has been derived from 

20 examination of a large and diverse set of domains (Neer et at, 1994; Bork, 1993). 

Figures 14(A) and (B) are photographic representations showing analysis of mRNA expression 
of mouse SOCS1 and SOCS5 and SOCS containing a WD-40 repeat (WSB2) and ankyrin 
repeats (ASB 1). 



Figure 15 is a representation showing the nucleotide sequence of the mouse SOCS4 cDNA. The 
nucleotides encoding the mature coding region from the predicted ATG "start" codon to the stop 
codon is shown in upper case, while the predicted 5' and 3' untranslated regions are shown in 
lower case. The relationship of mouse cDNA sequence to mouse and human EST contigs is 
30 illustrated in Figure 17. 



25 
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Figure 16 is a representation showing the predicted amino acid sequence of the mouse SOCS4 
protein, derived from the nucleotide sequence in Figure 15. The SOCS box, which also shown 
in Figure 13, is underlined. 

5 Figure 18 is a representation showing the nucleotide sequence of human SOCS4 cDNA contigs 
h4.1 and h4.2, derived from analysis of ESTs listed in Table 4.1. The relationship of these 
contigs to the mouse cDNA sequence is illustrated in Figure 17. 

Figure 19 is a diagrammatic representation showing the relationship of mouse SOCS5 genomic 
10 (57-2) and cDNA (5-3-2) clones to contigs derived from analysis of mouse ESTs (Table 5.1) and 
human cDNA clone (5-94-2) and ESTs (Table 5.2). The nucleotide sequence of the mouse 
SOCS5 contig is shown in Figure 20, with the sequence of human SOCS5 contig (h5.1) being 
shown in Figure 21. The deduced amino acid sequence of mouse SOCS5 is shown in Figure 
20B. The structure of the protein is shown schematically, with the SH2 domain indicated by 
15 () and the SOCS box by ( ). The putative 5' and 3' translated regions are shown by the thin 
solid line. 

Figure 20A is a representation showing the nucleotide sequence of the mouse SOCS5 derived 
. from analysis of genomic and cDNA clones. The nucleotides encoding the mature coding region 
20 from the predicted ATG "start" codon to the stop codon is shown in upper case, while the 
predicted 5' and 3' untranslated regions are shown in lower case. The relationship of mouse 
cDNA sequence to mouse and human EST contigs is illustrated in Figure 19. 

Figure 20B is a representation of the predicted amino acid sequence of mouse SOCS5 protein, 
25 derived from the nucleotide sequence in Figure 20A. The SOCS box, which also shown in 
Figure 13 is underlined 

Figure 21 is a representation showing the nucleotide sequence of human SOCS5 cDNA contig 
h5.1, derived from analysis of cDNA clone 5-94-2 and the ESTs listed in Table 5.2. The 
30 relationship of these contigs to the mouse cDNA sequence is illustrated in Figure 19. 
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Figure 22 is a diagrammatic representation showing the relationship of mouse SOCS6 cDNA 
clones (6-1A, 6-2A, 6-5B, 6-4N, 6-18, 6-29, 6-3N and 6-5N) to contigs derived from analysis 
of mouse ESTs (Table 6. 1) and human ESTs (Table 6.2). The nucleotide sequence of the mouse 
SOCS-6 contig is shown in Figure 23, with the sequence of human SOCS6 contigs (h6.1 and 
5 h6.2) being shown in Figure 24. The deduced amino acid sequence of mouse SOCS6 is shown 
in Figure 23B. The structure of the protein is shown schematically, while the WD-40 repeats 
indicated by ( ) and the SOCS box by ( ). The putative 5' and 3' untranslated regions are 
shown by the thin solid line. 

10 Figure 23A is a representation showing the nucleotide sequence of the mouse SOCS6 derived 
from analysis of cDNA clone 64-10A-1 1. The nucleotides encoding the part of the predicted 
coding region, ending in the stop codon are shown in upper case, while the predicted 3' 
untranslated regions are shown in lower case. The relationship of mouse cDNA sequence to 
mouse and human EST contigs is illustrated in Figure 22. 



Figure 23B is a representation showing the predicted amino acid sequence of mouse SOCS6 
protein, derived from the nucleotide sequence in Figure 23A. The SOCS box, which also shown 
in Figure 13 is underlined* 

20 Figure 24 is a representation showing the nucleotide sequence of human SOCS6 cDNA contig 
h6.1. derived from analysis of cDNA clone 5-94-2 and the ESTs listed in Table 6.2. The 
relationship of these contigs to the mouse cDNA sequence is illustrated in Figure 22 

Figure 25 as a diagrammatic representation showing the relationship of mouse SOCS7 cDNA 
25 clone (74-10A-1 1) to contigs derived from analysis of mouse ESTs (Table 7. 1) and human ESTs 
(Table 7.2). The nucleotide sequence of the mouse SOCS7 contig is shown in Figure 26 with 
the sequence of human SOCS7 contigs (h7.1 and h7.2) being shown in Figure 27. The deduced 
amino acid sequence of mouse SOCS7 is shown in Figure 26B. The structure of the protein is 
shown schematically, with the ankyrin repeats indicated by ( ) and the SOCS box by ( ). The 
30 putative 5' and 3' untranslated regions are shown by the thin solid line in the mouse and by the 
wavy line in h7.2. Based on analysis of clones isolated to date and ESTs the 3' untranslated 



15 
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regions of mSOCS7 and hSOCS7 share little similarity. 

Figure 26A is a representation showing the nucleotide sequence of the mouse SOCS1 derived 
from analysis of cDNA clone 74-10A-1 1. The nucleotides encoding the part of the predicted 
5 coding region, ending in the stop codon are shown in upper case, while the predicted 3' 
untranslated regions are shown in lower case. The relationship of mouse cDNA sequence to 
mouse and human EST contigs is illustrated in Figure 25. 

Figure 26B is a representation showing the predicted amino acid sequence of mouse SOCS7 
10 protein, derived from the nucleotide sequence in Figure 26A. The SOCS box, which also shown 
in Figure 13 is underlined. 

Figure 27 is a representation showing the nucleotide sequence of human SOCS7 cDNA contig 
h7.1 and h7.2 derived from analysis of the ESTs listed in Table 7.2. The relationship of these 
15 contigs to the mouse cDNA sequence is illustrated in Figure 25. 

Figure 28 is a diagrammatic representation of the relationship of sequence derived from analysis 
. of mouse SOCS8 ESTs (Table 8. 1 and Figure 29 A) to the predicted protein structure of mouse 
. SOCS8. The deduced partial amino acid sequence of mouse SOCS8 is shown in Figure 29B. 
20 The structure of the protein is shown schematically with the SOCS box highlighted ( ). The 

predicted 3 ' untranslated region is shown by the thin line. 

Figure 29A is a representation showing the partial nucleotide sequence of mouse SOCS8 cDNA 
(contig 8.1) derived from analysis of ESTs. The nucleotides encoding the part of the predicted 
25 coding region, ending in the STOP codon are shown in upper case, while the predicted 3' 
untranslated regions are shown in lower case. 

Figure 29B is a representation showing the partial predicted amino acid sequence of the mouse 
SOCS8 protein, derived from the nucleotide sequence in Figure 29A. The SOCS box, which 
30 also shown in Figure 13 is underlined. 
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Figure 30 is a diagrammatic representation showing the relationship of mouse SOCS9 ESTs 
(Table 9.1) and human SOCS9 ESTs (Table 9.2). The nucleotide sequence of the mouse SOCS9 
contig (m9.1) is shown in Figure 3 1, with the sequence of human SOCS9 contig (h9.1) being 
shown in Figure 32. The deduced amino acid sequence of human SOCS9 is shown 
5 schematically, with the SH2 domain indicated by ( ) and the SOCS box by ( ). The putative 3' 
untranslated region is shown by the thin solid line. 

Figure 31 is a representation showing the partial nucleotide sequence of mouse SOCS9 cDNA 
(contig m9.1), derived from analysis of the ESTs listed in Table 9.1. The relationship of these 
10 contigs to the mouse cDNA sequence is illustrated in Figure 30. 

Figure 32 is a representation showing the partial nucleotide sequence of human SOCS9 cDNA 
. (contig h9. 1), derived from analysis of the ESTs listed in Table 9.2. Although it is clear that 
contig h9.1 encodes a protein with an SH2 domain and a SOCS box, the quality of the sequence 
15 is not high enough to derive a single unambiguous open reading frame. The relationship of these 
contigs to the mouse cDNA sequence is illustrated in Figure 30. 

figure 33 is a representation showing the relationship of mouse SOCS10 cDNA clones (10-9, 
. 10-12, 10-23 and 10-24) to contigs derived from analysis of mouse ESTs (Table 10.1) and 

20 human ESTs (Table 10.2). The nucleotide sequence of the mouse SOCS 10 contig is shown in 
Figure 10.2, with the sequence of human SOCS10 contigs (hlO.l and hi 0.2) being shown in 
Figure 35. The predicted structure of the protein is shown schematically, with the ankyrin 
repeats indicated by ( ) and the SOCS box by ( ). The putative 3 ' untranslated regions is shown 
by the thin line solid line in the mouse and by the wavy line in hi 0.2. Based on analysis of clones 

25 isolated to date and ESTs the 3 ' untranslated regions of mSOCS-10 and hSOCS-10 share little 
similarity. 

Figure 34 is a representation showing the nucleotide sequence of the mouse SOCS 10 derived 
from analysis of cDNA clone 10-9, 10-12, 10-23 and 10-24. The nucleotides encoding the part 
30 of the predicted coding region, ending in the stop codon are shown in upper case, while the 
predicted 3 ' untranslated regions are shown in lower case. Although it is clear that contig m 1 0. 1 
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encodes a protein with a series of ankyrin repeats and a SOCS box, the quality of the sequence 
is not high enough to derive a single unambiguous open reading frame. The relationship of 
mouse cDNA sequence to mouse and human EST contigs is illustrated in Figure 33. 

5 Figure 35 is a representation showing the nucleotide sequence of human SOCS 10 cDNA contig 
hl0.2 and hl0.2 derived from analysis of the ESTs listed in Table 10.2. The relationship of these 
contigs to the mouse cDNA sequence is illustrated in Figure 33. 

Figure 36A is a representation showing the partial nucleotide sequence of the human SOCS 1 1 
cDNA derived from analysis of ESTs listed in Table 11.1 The nucleotides encoding the mature 
10 coding region from the predicted ATG "start" codon to the stop codon is shown in uppercase, 
while the predicted 5' and 3' untranslated regions are shown in lower case. The relationship of 
the partial cDNA sequence, derived from ESTs, to the predicted protein is shown in Figure 37. 

Figure 36B is a representation showing the partial predicted amino acid sequence of human 
15 SOCS 1 1 protein, derived from the nucleotide sequence in Figure 36A. The SOCS box, which 
also shown in Figure 13, is underlined. 

Figure 37 is a diagrammatic representation showing the relationship of sequence derived from 
. analysis of human SOCS- 1 1 ESTs (Table 11.1 and Figure 36A) to the predicted protein structure 
20 of human SOCS1 1. The deduced partial amino acid sequence of human SOCS1 1 is shown in 
Figure 36B. The structure of the protein is shown schematically with the SH2 domain shown 
by C) and the SOCS box highlighted by ( ). The predicted 3' untranslated region is shown by 
the thin line. 

25 Figure 38 is a diagrammatic representation showing the relationship of mouse SOCS 12 cDNA 
clones (12-1) to contigs derived from analysis of mouse ESTs (Table 12.1) and human ESTs 
(Table 12.2). The nucleotide sequence of the mouse SOCS 12 contig is shown in Figure 12.2, 
with the sequence of human SOCS12 contigs (hl2.1 and hl2.2) being shown in Figure 40. The 
deduced partial amino acid sequence of mouse SOCS12 is shown in Figure 39. The structure 

30 of the protein is sown schematically, with the ankyrin repeats indicated by ( ) and the SOCS box 
by ( ). The putative .3' untranslated region is shown by the thin line solid line in the mouse and 
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by the wavy line in hi 2.2. Based on analysis of clones isolated to date and ESTs the 3' 
untranslated regions of mSOCS12 and hSOCS12 share little similarity. 



Figure 39 is a representation showing the nucleotide sequence of the mouse SOCS12 derived 
5 from analysis of cDNA clone 12-1 and the ESTs listed in Table 12.1. The nucleotides encoding 
the part of the predicted coding region, including the stop codon are shown in upper case, while 
the predicted 3' untranslated region is shown in lower case. By homology with human SOCS12 
it is clear that contig m!2.1 encodes a protein with a series of ankyrin repeats and a SOCS box, 
the quality of the sequence is not high enough to derive a single unambiguous open reading 
10 frame. The relationship of mouse cDNA sequence to mouse and human EST contigs is 
illustrated in Figure 38. 

Figure 40 is a representation showing the nucleotide sequence of human SOCS 12 cDNA contig 
hl2. 1 and h!2.2 derived from analysis of the ESTs listed in Table 12.2. The relationship of these 
15 contigs to the mouse cDNA sequence is illustrated in Figure 38. 



Figure 41 is a diagrammatic representation showing the relationship of contig ml3.1 derived 
from analysis of mouse SOCS 13 cDNA clones (62-1, 62-6-7, 62-14) and mouse ESTs (Table 
. 13.1) to contig hl3.1 derived from analysis of human ESTs (Table 13.2). The nucleotide 
20 sequence of the mouse SOCS 13 contig is shown in Figure 42, with the sequence of human 
SOCS 13 contig (hl3. 1) being shown in Figure 43. The deduced amino acid sequence of mouse 
SOCS 13 is shown in Figure 42B. The structure of the protein is shown schematically, with the 
WD-40 repeats highlighted by ( ) and the SOCS box highlighted by ( ). The 3' untranslated 
region is shown by the thin line solid line. 

25 

Figure 42A is a representation showing the nucleotide sequence of the mouse SOCS 13 derived 
from analysis of cDNA clones 62-1 , 62-6-7 and 62-14. The nucleotides encoding part of the 
predicted coding region, ending in the stop codon are shown in upper case, while those encoding 
the predicted 3' untranslated regions are shown in lower case. The relationship of mouse cDNA 
30 sequence to mouse and human EST contigs is illustrated in Figure 41. 
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Figure 42B is a representation showing the predicted amino acid sequence of mouse SOCS 13 
protein, derived from the nucleotide sequence in Figure 42A. The SOCS box, which also shown 
in Figure 13 is underlined. 

5 Figure 43 is a representation showing the nucleotide sequence of human SOCS 1 3 cDN A contig 
hl3.1 derived from analysis of the ESTs listed in Table 13.2. The relationship of these contigs 
to the mouse cDNA sequence is illustrated in Figure 41 . 

Figure 44 is a diagrammatic representation showing the relationship of a partial mouse SOCS 14 
10 cDNA clone (14-1) to contigs derived from analysis of mouse ESTs (Table 14.1). The 
nucleotide sequence of the mouse SOCS 14 contig is shown in Figure 45. The deduced partial 
amino acid sequence of mouse SOCS 14 is shown in Figure 45B. The structure of the protein 
is shown schematically, with the SH3 domain indicated by ( ) and the SOCS box by ( ). The 
putative 3' untranslated region is shown by the thin line. 

15 

Figure 45A is a representation showing the nucleotide sequence of the mouse SOCS 14 derived 
from analysis of genomic and cDNA clones. The nucleotides encoding the mature coding region 
from the predicted ATG "start" codon to the stop codon is shown in upper case, while the 
predicted 5' and 3' untranslated regions are shown in lower case. The relationship of mouse 
20 cDNA sequence to mouse and human EST contigs is illustrated in Figure 44. 

Figure 45B is a representation showing the predicted amino acid sequence of mouse SOCS 14 
protein, derived from the nucleotide sequence in Figure 45B. The SOCS box, which also shown 
in Figure 13 is underlined. 

25 

Figure 46 is a diagrammatic representation showing the relationship of contig ml 5.1 derived 
from analysis of mouse BAC and mouse ESTs (Table 15.1) to contig hl5.1 derived from analysis 
of the human BAC and human ESTs (Table 15.2). The nucleotide sequence of the mouse 
SOCS15 contig is shown in Figure 47, with the sequence of human SOCS15 contig (hl5.1) 
30 being shown in Figure 47. The deduced amino acid sequence of mouse SOCS 15 is shown in 
Figure 47B. The structure of the protein is shown schematically, with the WD-40 repeats 
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highlighted by ( ) and the SOCS box highlighted by ( ). The 5' and 3' untranslated region are 
shown by the thin line solid line. The introns which interrupt the coding region are shown by A . 

Figure 47A is a representation showing the nucleotide sequence covering the mouse SOCS IS 
5 gene derived from analysis the mouse BAC listed in Table 15. 1. The nucleotides encoding the 
predicted coding region, beginning with the ATG and ending in the stop codon are shown in 
upper case, while those encoding the predicted 5' untranslated region, the introns and the 3* 
untranslated region are shown in lower case. The relationship of mouse BAC to mouse and 
human ESTs contigs is illustrated in Figure 46. 



Figure 47B is a representation showing the predicted amino acid sequence of mouse SOCS 15 
protein, derived from the nucleotide sequence in Figure 47 A. The SOCS box, which also shown 
in Figure 13 is underlined. 

15 Figure 48A is a representation showing the nucleotide sequence covering the human SOCS 15 
gene derived from analysis the human BAC listed in Table 15.2. The nucleotides encoding the 
predicted coding region, beginning with the ATG and ending in the stop codon are shown in 
upper case, while those encoding the predicted 5' untranslated region, the introns and the 3' 
untranslated region are shown in lower case. The relationship of the human BAC to mouse and 

20 human ESTs contigs is illustrated in Figure 46. 

Figure 48B is a representation showing the predicted amino acid sequence of human SOCS 15 
protein, derived from the nucleotide sequence in Figure 48 A. The SOCS box, which also shown 
in Figure 13 is underlined. 



Figure 49 is a photographic representation showing SOCS1 inhibition of JAK2 kinase activity. 
(A) Upper panel. Cos M6 cells were transiently transfected with either Flag-tagged mJAK2 and 
mSOCS-1 DNA (SOCS1) or Flag-mJAK2 DNA alone (-), lysed, JAK2 proteins 
immunoprecipitated using anti-JAK2 antibody and subjected to an in vitro kinase assay. Lower 
30 panel. A portion of the JAK2 immunoprecipitates were Western blotted with anti-JAK2 
antibody. (B) Upper panel. Cos M6 cells were transiently transfected with Flag- mJAK2 and 



10 
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Flag- mSOCS-1 DNA or Flag-mJAK2 DNA alone, lysed, JAK2 proteins immunoprecipitated 
using anti-JAK2 (UBI) and separated by SDS/PAGE gel. Immunoprecipitates were then 
analysed by Western blot with anti-phosphotyrosine antibody. Lower panel; JAK2 expression. 
Cos cell lysates were separated by SDS/PAGE gel and analysed by Western blot with anti-FLAG 
5 antibody (M2). 

Figure 50 is a photographic representation showing interaction between JAK2 and SOCS 
protein. (A) Cos M6 cells were transiently transfected with Hag-tagged mJAK2 and various 
Flag-tagged SOCS DNAs (SOCS-l;Sl, SOCS-2;S2, SOCS-3;S3, CIS) or Flag-mJAK2 alone, 

10 lysed, JAK2 proteins immunoprecipitated using anti-JAK2 (UBI) and separated by SDS/PAGE. 
Immunoprecipitates were then analysed by Western blot with anti-FLAG antibody (M2). (B) 
Cos cell lysates described in (A) were separated by SDS/PAGE and expression levels of the 
various proteins were determined by Western blot with anti-FLAG antibody (M2). (C) JAK2 
tyrosine phosphorylation. Cos cell lysates described in (A) were separated by SDS/PAGE and 

15 proteins analysed by Western blot with anti-phosphotyrosine antibody. 

Figure 51 is a diagrammatic representation of pPgalpAloxneo. 
Figure 52 is a diagrammatic representation of ppgalpAloxneoTK. 

20 

Figure 53 is a diagrammatic representation of SOCS1 knockout construct. 
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DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

The present invention provides a new family of modulators of signal transduction. As the initial 
members of this family suppressed cytokine signalling, the family is referred to as the 
5 "suppressors of cytokine signalling" family of M SOCS\ The SOCS family is defined by the 
presence of a C-terminal domain referred to as a M SOCS box". Different classes of SOCS 
molecules are defined by a motif generally but not exclusively located N-terminal to the SOCS 
box and which is involved by proteinrmolecule interaction such as protein:DNA or 
proteinrprotein interaction. Particularly preferred motifs are selected from an SH2 domain, WD- 
10 40 repeats and ankyrin repeats. 

WD-40 repeats were originally recognised in the P-subunit of G-proteins. WD-40 repeats appear 
to form a P-propeller-like structure and may be involved in protein-protein interactions. Ankyrin 
repeats were originally recognised in the cytoskeletal protein ankryin. 



Members of the SOCS family may be identified by any number of means. For example, SOCS 1 
to SOCS3 were identified by their ability to suppress cytokine-mediated signal transduction and, 
hence, were identified based on activity. SOCS4 to SOCS 15 were identified as nucleotide 
sequences exhibiting similarity at the level of the SOCS box. 



The SOCS box is a conserved motif located in the C-terminal region of the SOCS molecule. In 
accordance with the present invention, the amino acid sequence of the SOCS box is: 



15 



20 



25 



Xj X 2 X 3 X 4 X 5 X 6 X 7 X 8 X9 X 10 Xji X 12 X 13 X 14 X I3 X I6 [XJ n X 17 X 18 X 19 x ; 

^21 ^22 ^23 P^jlo X25 X25 X27X28 



20 



wherein: 



X, is L, I, V, M, A or P; 
X 2 is any amino acid residue; 
X 3 is P, T or S; 
X<is L, I, V, M, AorP; 
X5 is any amino acid; 
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X 6 is any amino acid; 
X 7 is L, I, V, M, A, F t Y or W; 
X 8 isC,TorS; 
X, is R, K or H; 
5 X l0 is any amino acid; 

X,, is any amino acid; 
X 12 is L, I, V, M, A or P; 
X, 3 is any amino acid; 
X, 4 is any amino acid; 
10 X 1S is any amino acid; 

X !6 is L, I, V, M, A, P, G, C, TorS; 

[XJ n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence X; may comprise the same or different amino 
acids selected from any amino acid residue; 
15 X 17 isL,I, V,M, AorP; 

X lg is any amino acid; 
X ]9 is any amino acid; 
X20 L, I, V, M, A or P; 
X 2I is P; 

20 Xjj is L, I, V, M, A, P or G; 

^ X^isPor N; 

[X^ n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 
25 X 24 isL,I, V,M, AorP; 

X^ is any amino acid; 

X 26 is any amino acid; 

Xyy is Y or F; and 

X^isL, I, V,M, AorP. 



30 



As stated above and in accordance with the present invention, SOCS proteins are divided into 
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separate classes based on the presence of a proteinrmolecule interacting region such as but not 
limited to an SH2 domain, WD-40 repeats and ankyrin repeats located N-terminal of the SOCS 
box. The latter three domains are protein rprotein interacting domains. 

5 Examples of SH2 containing SOCS proteins include SOCS1, SOCS2, SOCS3, SOCS5, SOCS9, 
SOCS 11 and SOCS 14. Examples of SOCS containing WD-40 repeats include SOCS4, SOCS6 
and SOCS 15. Examples of SOCS containing ankyrin repeats include SOCS7, SOCS10 and 
SOCS12. 

10 The present invention provides inter alia nucleic acid molecules encoding SOCS proteins, 
purified naturally occurring SOCS proteins as well as recombinant forms of SOCS proteins and 
methods of modulating signal transduction by modulating activity of SOCS proteins or 
expression of SOCS genes. Preferably, signal transduction is mediated by a cytokine, examples 
of which include EPO, TPO, G-CSF. GM-CSF, IL-3, IL-2, IL-4, IL-7, flL-13, EL-6, LJF, IL-12, 

15 IFNy, TNFa, IL-1 and/or M-CSF. Particularly preferred cytokines include IL-6, LJF, OSM, 
IFN-y and/or thrombopoietin. 

Accordingly, one aspect of the present invention provides an isolated nucleic acid molecule 
„ comprising a sequence of nucleotides encoding or complementary to a sequence encoding a 
20 protein or a derivative, homologue, analogue or mimetic thereof or comprises a nucleotide 
sequence capable of hybridizing thereto under low stringency conditions at 42°C wherein said 
protein comprises a SOCS box in its C-terminal region and optionally a proteinrmolecule 
interacting domain N-terminal of the SOCS box. 

25 Preferably, the proteinrmolecule interacting domain is a protein:DNA or proteinrprotein 
interacting domain. Most preferably, the proteinrmolecule interacting domain is one of an SH2 
domain, WD-40 repeats and/or ankyrin repeats. 

As stated above, preferably the subject SOCS modulate cytokine-mediated signal transduction. 
30 The present invention extends, however, to SOCS molecules modulating other effector-mediated 
signal transduction such as mediated by other endogenous or exogenous molecules, antigens, 
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mierobes and microbial products, viruses or components thereof, ions, hormones and parasites. 
Endogenous molecules in this context are molecules produced within the cell carrying the SOCS 
molecule. Exogenous molecules are produced by other cells or are introduced to the body. 

5 Preferably, the nucleic acid molecule or SOCS protein is in isolated or purified form. The terms 
"isolated" and '"purified" mean that a molecule has undergone at least one purification step away 
from other material. 

Preferably, the nucleic acid molecule is in isolated form and is DNA such as cDNA or genomic 
10 DNA. The DNA may encode the same amino acid sequence as the naturally occurring SOCS 
or the SOCS may contain one or more amino acid substitutions, deletions and/or additions. The 
nucleotide sequence may correspond to the genomic coding sequence (including exons and 
introns) or to the nucleotide sequence in cDN A from mRNA transcribed from the genomic gene 
or it may carry one or more nucleotide substitutions, deletions and/or additions thereto. 



In a preferred embodiment, the nucleic acid molecule comprises a sequence of nucleotide 
encoding or complementary to a sequence encoding a SOCS protein or a derivative, homologue, 
analogue or mimetic thereof wherein the amino acid sequence of said SOCS protein is selected 
. from SEQ ID NO:4 (mSOCSl), SEQ ID NO:6 (mSOCS2), SEQ ID NO:8 (mSOCS3), SEQ ID 

20 NO:10 (hSOCSl), SEQ ID NO:12 (rSOCSl), SEQ ED NO:14 (mSOCS4), SEQ ID NO:18 
(mSOCSS), SEQ ID NO:21 (mSOCS6), SEQ ID NO:25 (mSOCS27), SEQ ID NO:29 
(mSOCS8), SEQ ID NO:36 (hSOCSll), SEQ ED NO:41 (mSOCS13), SEQ ID NO:44 
(mSOCS14), SEQ ID NO:46 (mSOCSIS) and SEQ ID NO:48 (mSOCSIS) or encodes an amino 
acid sequence with a single or multiple amino acid substitution, deletion and/or addition to the 

25 listed sequences or is a nucleotide sequence capable of hybridizing to the nucleic acid molecule 
under low stringency conditions at 42°C. 

In an even more preferred embodiment, the present invention provides a nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding a 
30 SOCS protein or a derivative, homologue, analogue or mimetic thereof wherein the nucleotide 
sequence is selected from a nucleotide sequence substantially set forth in SEQ ED NO:3 
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(mSOCSl), SEQ ID NO:5 (mSOCS2), SEQ ID NO:7 (mSOCS3), SEQ ID NG:9 (hSOCSl 1), 
SEQ ID NO: 1 1 (rSOCSl), SEQ ID NO:13 (mSOCS4), SEQ ID NO: 15 and SEQ ID NO:16 
(hSOCS4), SEQ ID NO:17 (mSOCSS), SEQ ID NO:19 (hSOCSS), SEQ ID NO:20 (mSOCS6), 
SEQ ID NO:22 and SEQ ID NO:23 (hSOCS6) f SEQ ID NO:24 (mSOCS7), SEQ ID NO:26 and 

5 SEQ ID NO:27 (hSOCS7), SEQ ID NO:28 (mSOCS8), SEQ ID NO:30 (mSOCS9), SEQ ID 
NO:31 (hSOCS9), SEQ ID NO:32 (mSOCSlO), SEQ ID NO:33 and SEQ ID NO:34 
(hSOCSlO), SEQ ID NO:35 (hSOCSll), SEQ ID NO:37 (mSOCS12), SEQ ID NO:38 and 
SEQ ID NO:39 (hSOCS12), SEQ ID NO:40 (mSOCS13), SEQ ID NO:42 (hSOCS13), SEQ 
ID NO:43 (mSOCS14), SEQ ID NO:45 (mSOCS15) and SEQ ID NO:47 (hSOCSIS) or a 

0 nucleotide sequence having at least about 15% similarity to all or a region of any of the listed 
. sequences or a nucleic acid molecule capable of hybridizing to any of the listed sequences under 
low stringency conditions at 42°C. 

Reference herein to a low stringency at 42 °C includes and encompasses from at least about 1% 
15 v/v to at feast about 15% v/v formanude and from at least about 1M to at least about 2M salt for 
hybridisation, and at least about 1M to at least about 2M salt for washing conditions. Alternative 
stringency conditions may be applied where necessary, such as medium stringency, which 
includes and encompasses from at least about 16% v/v to at least about 30% v/v formamide and 
from at least about 0.5M to at least about 0.9M salt for hybridisation, and at least about 0.5M 
to at least about 0.9M salt for washing conditions, or high stringency, which includes and 
encompasses from at least about 31% v/v to at least about 50% v/v formamide and from at least 
about 0.01M to at least about 0.1SM salt for hybridisation, and at least about 0.01M to at least 
about 0.1 5M salt for washing conditions. 

In another embodiment, the present invention is directed to a SOCS protein or a derivative, 
homologue, analogue or mimetic thereof wherein said SOCS protein is identified as follows: 

human SOCS4 characterised by EST81149, EST 180909, EST182619, ya99H09, 
ye70co4, yh53c09, yh77gl 1, yh87h05, yi45h07, yj04e06, yql2h06, yq56a06, yq60e02, 
yq92g03, yq97h06, yr90f01, yt69c03, yv30a08, yv55f07, yv57h09, yv87h02, yv98el l t 
yw68dl0, yw82a03, yx08a07, yx72h06, yx76b09, yy37h08, yy66b02, za81f08, zbl8f07, 
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zc06e08, zdl4g06, zd51hl2, zd52b09, ze25gl 1, ze69f02, zf54fl)3, zh96e07, zv66hl2, 
zs83a08 and zs83g08; 

mouse SOCS-4 characterised by mc65fD4, mf42e06, mplOclO, mr81g09, and mtl9hl2; 

human SOCS-5 characterised by EST15B103, EST15B105, EST27530 and zf50f01; 

_ mouse SOCS-5 characterised by mc55a01, mh98f09, my26hl2 and ve24e06; 

human SOCS-6 characterised by yf61e08, yf93a09, yg05fl2, yg41f04, yg45c02, 
yhllflO, yhl3b05, zc35al2, ze02h08, zI09a03, zl69elO, zn39d08 and zo39e06; 

mouse SOCS-6 characterised by mc04c05, md48aQ3, mf31d03, mh26b07, mh78ell, 
mh88h09, mh94h07, mi27h04 and mj29c05; mp66g04, mw75g03, va53b05, vb34h02, 
vc55d07, vc59e05> vc67d03, vc68dl0, vc97h01, vc99c08, vd07h03, vd08c01, vd09bl2, 
vdl9b02, vd29a04 and vd46d06; 

human SOCS-7 characterised by STS WI30171, EST00939, EST12913, yc29b05, 
yp49fl0, ztl0fO3 and zx73g04; 

mouse SOCS-7 characterised by mj39a01 and vi52h07; 
mouse SOCS-8 characterised by mj6e09 and vj27a029; 

human SOCS-9 characterised by CSRL-82f2-u, EST1 14054, yy06b07, yy06g06, 
zr40c09, zr72h01, yx92c08, yx93b08 and hfe0662; 

mouse SOCS-9 characterised by me65d05; 

human SOCS-10 characterised by aa48hl0, zp35h01, zp97hl2, zq08h01, zr34g05, 
EST73000 and HSDHEI005; 

SUBSTITUTE SHEET (Rule 26) 



09:50:51 




WO 98/20023 PCT/AU97/00729 

-39- 

mouse SOCS-10 characterised by mbl4dl2„ mb40f06, mg89bl 1, mq89el2, mp03gl2 
andvh53cll; 

human SOCS-1 1 characterised by zt24h06 and zr43b02; 

5 

human SOCS-1 3 characterised by EST59161; 

mouse SOCS-1 3 characterised by ma39a09, me60c05, mi78g05, mklOcl 1, mo48gl2, 
mp94a01 , vb57c07 and vh07cl 1 ; and 

10 

human SOCS-1 4 characterised by mi75e03, vd29hl 1 and vd53g07; 

or a derivative or homologue of the above ESTs characterised by a nucleic acid molecule 

being capable of hybridizing to any of the listed ESTs under low stringency conditions 

at42*C. 

15 

In another embodiment, the nucleotide sequence encodes the following amino acid sequence: 



Xj X 2 X 3 X 4 X 5 Xg X 7 X 8 X$ X l0 Xn X 12 Xj 3 X M X w X w [XJ 0 X n X, 8 X| 9 X^ 
X 2 | X22 X23 [Xj] tt X24 X25 X25 X27X28 

20 

wherein: X, is L, I f V, M, A or P; 

X 2 is any amino acid residue; 

X 3 is P,TorS; 

X^L, I, V.MAorP; 
25 X 5 is any amino acid; 

X 6 is any amino acid; 

X 7 is L, I, V, M, A, F, Y or W; 

X 8 is C, T or S; 

X 9 isR f KorH; 
30 X, 0 is any amino acid; 

X n is any amino acid; 
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X 12 is L, I, V, M, AorP; 
X 13 is any amino acid; 
X I4 is any amino acid; 
X 15 is any amino acid; 
5 X l6 is L, I, V, M, A, P, G, C, T or S; 

[XJ n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X 17 is L, I, V, M, A or P; 
10 X 18 is any amino acid; 

X 19 is any amino acid; 
XnUI, V,M,AorP; 
X 2l is P; 

X^ is L, I, V, M, A, P or G; 
15 X^isPorN; 

[XV) n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X 24 is L, I, V, M, A or P; 
20 X25 is any amino acid; 

X 26 is any amino acid; 

X 27 is Y or F; and 

Xjg is L, I, V, M, A or P. 

25 The above sequence comparisons are preferably to the whole molecule but may also be to part 
thereof. Preferably, the comparisons are made to a contiguous series of at least about 21 
nucleotides or at least about 5 amino acids. More preferably, the comparisons are made against 
at least about 21 contiguous nucleotides or at least 7 contiguous amino acids. Comparisons may 
also only be made to the SOCS box region or a region encompassing the protein:molecule 

30 interacting region such as the SH2 domain WD-40 repeats and/or ankyrin repeats. 
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Still another embodiment of the present invention contemplates an isolated polypeptide or a 
derivative, homologue, analogue or mimetic thereof comprising a SOCS box in its C-terminal 
region. 

5 Preferably the polypeptide further comprises a protein:molecule interacting domain such as a 
proteinrDNA or proteinrprotein interacting domain. Preferably, this domain is located N-terminal 
of the SOCS box. It is particularly preferred for the protein:molecule interacting domain to be 
at least one of an SH2 domain, WD-40 repeats and/or ankyrin repeats. 

10 Preferably, the signal transduction is mediated by a cytokine selected from EPO, TPO, G-CSF, 
GM-CSF, IL-3, IL-2, IL-4, IL-7, IL-13, IL-6, LIF, IL-12, IFNy, TNFa, IL-1 and/or M-CSF. 
Preferred cytokines are IL-6, LIF, OSM, IFN-y or thrombopoietin. 

More preferably, the protein comprises a SOCS box having the amino acid sequence: 




30 



25 



20 



wherein: 



X, is L, I, V, M, A or P; 

X 2 is any amino acid residue; 

X 3 is P, T or S; 

X 4 isL, I, V, M, AorP; 

X 5 is any amino acid; 

X$ is any amino acid; 

X 7 is L, I, V, M, A, F, Y or W; 

X 8 is C, T or S; 

Xv is R, K or H; 

X l0 is any amino acid; 

X„ is any amino acid; 

X I2 is L, I, V, M, A or P; 

X l3 is any amino acid; 
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X ]4 is any amino acid; 
X 15 is any amino acid; 
X 16 is L, I, V, M, A, P, G, C, T or S; 

[XJ n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
5 and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 
X l7 isL,I, V,M, AorP; 
X 18 is any amino acid; 
X, 9 is any amino acid; 
10 X^L.I, V,M> AorP; 

X 2I isP; 

X^ is L, I, V, M, A, P or G; 
Xjj is P or N; 

[XJ n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
15 and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X^ is L, I, V, M, AorP; 

Xjj is any amino acid; 

X M is any amino acid; 
20 X 27 is Y or F; and 

X 28 is L, I, V, M, A or P. 



Still another embodiment provides an isolated polypeptide or a derivative, homologue, analogue 
or mimetic thereof comprising a sequence of amino acids substantially as se^ forth in SEQ ID 

25 NO:4 (mSOCSl), SEQ ID NO:6 (mSOCS2), SEQ ID NO:8 (mSOCS3), SEQ ID NO:10 
(hSOCSl), SEQ ID NO: 12 (rSOCSl), SEQ ID NO: 14 (mSOCS4), SEQ ID NO: 18 (mSOCS5), 
SEQ ID NO:21 (mSOCS6), SEQ ID NO:25 (mSOCS7), SEQ ID NO:29 (mSOCS8), SEQ ID 
NO:36 (hSOCSll), SEQ ID NO:41 (mSOCS13), SEQ ID NO:44 (mSOCS14), SEQ ID NO:46 
(mSOCS15) and SEQ ID NO:48 (hSOCS15) or an amino acid sequence having at least 15% 

30 similarity to all or a part of the listed sequences. 
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Preferred nucleotide percentage similarities include at least about 20%, at least about 40%, at 
least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90% 
or above such as 93%, 95%, 98% or 99%. 

5 Preferred amino acid similarities include at least about 20%, at least about 30%, at least about 
40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least 
about 90%, at least about 95%, at least about 97% or 98% or above. 

As stated above, similarity may be measured against an entire molecule or a region comprising 
10 at least 21 nucleotides or at least 7 amino acids. Preferably, similarity is measured in a conserved 
region such as SH2 domain, WD-40 repeats, ankyrin repeats or other proteinrmolecule 
interacting domains or a SOCS box. 

The term "similarity" includes exact identity between sequences or, where the sequence differs, 
15 different amino acids are related to each other at the structural, functional, biochemical and/or 
conformational levels. 

The nucleic acid molecule may be isolated from any animal such as humans, primates, livestock 
animals (e.g. horses, cows, sheep, donkeys, pigs), laboratory test animals (e.g. mice, rats, rabbits, 
20 hamsters, guinea pigs), companion animals (e.g. dogs, cats) or captive wild animals (e.g. deer, 
foxes, kangaroos). 

The terms "derivatives" or its singular form "derivative" whether in relation to a nucleic acid 
molecule or a protein includes parts, mutants, fragments and analogues as .well as hybrid or 
25 fusion molecules and glycosylation variants. Particularly useful derivatives comprise single or 
multiple amino acid substitutions, deletions and/or additions to the SOCS amino acid sequence. 

Preferably, the derivatives have functional activity or alternatively act as antagonists or agonists. 
The present invention further extends to homologues of SOCS which include the functionally or 
30 structurally related molecule from different animal species. The present invention also 
encompasses analogues and i^iimetics. Mimetics include a class of molecule generally but not 
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necessarily having a non-amino acid structure and which functionally are capable of acting in an 
analogous manner to the protein for which it is a mimic, in this case, a SOCS. Mimetics may 
comprise a carbohydrate, aromatic ring, lipid or other complex chemical structure or may also 
be proteinaceous in composition. Mimetics as well as agonists and antagonists contemplated 
5 herein are conveniently located through systematic searching of environments, such as coral, 
marine and freshwater river beds, flora and microorganisms. This is sometimes referred to as 
natural product screening. Alternatively, libraries of synthetic chemical compounds may be 
screened for potentially useful molecules. 

10 As stated above, the present invention contemplates agonists and antagonists of the SOCS. One 
example of an antagonist is an antisense oligonucleotide sequence. Useful oligonucleotides are 
those which have a nucleotide sequence complementary to at least a portion of the protein- 
coding or "sense" sequence of the nucleotide sequence. These anti-sense nucleotides can be 
used to effect the specific inhibition of gene expression. The antisense approach can cause 

15 inhibition of gene expression apparently by forming an anti-parallel duplex by complementary 
base pairing between the antisense construct and the targeted mRNA, presumably resulting in 
hybridisation arrest of translation. Ribozymes and co-suppression molecules may also be used. 
Antisense and other nucleic acid molecules may first need to be chemically modified to permit 
penetration of cell membranes and/or to increase their serum half life or otherwise make them 

20 more stable for in vivo administration. Antibodies may also act as either antagonists or agonists 
although are more useful in diagnostic applications or in the purification of SOCS proteins. 
Antagonists and agonists may also be identified following natural product screening or 
screening of libraries of chemical compounds or may be derivatives or analogues of the SOCS 
molecules. 

25 

Accordingly, the present invention extends to analogues of the SOCS proteins of the present 
invention. Analogues may be used, for example, in the treatment or prophylaxis of cytokine 
mediated dysfunction such as autoimmunity, immune suppression or hyperactive immunity or 
other condition including but not limited to dysfunctions in the haemopoietic, endocrine, hepatic 
30 and neural systems. Dysfunctions mediated by other signal transducing elements such as 
hormones or endogenous or exogenous molecules, antigens, microbes and microbial products, 
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viruses or components thereof, ions, hormones and parasites are also contemplated by the 
present invention. 



Analogues of the proteins contemplated herein include, but are not limited to, modification to 
5 side chains, incorporating of unnatural amino acids and/or their derivatives during peptide, 
polypeptide or protein synthesis and the use of crosslinkers and other methods which impose 
conformational constraints on the proteinaceous molecule or their analogues. 

Examples of side chain modifications contemplated by the present invention include 
10 modifications of amino groups such as by reductive alkylation by reaction with an aldehyde 
followed by reduction with NaBH^ amidination with methylacetimidate; acyladon with acetic 
anhydride; carbamoylation of amino groups with cyanate; trinitrobenzylation of amino groups 
with 2, 4, 6-trinitrobenzene sulphonic acid (TNBS); acylation of amino groups with succinic 
anhydride and tetrahydrophthalic anhydride; and pyridoxylation of lysine with pyridoxal-5- 
15 phosphate followed by reduction with NaBH4. 

The guanidine group of arginine residues may be modified by the formation of heterocyclic 
condensation products with reagents such as 2,3-butanedione, phenylglyoxal and glyoxal. 

20 The carboxyl group may be modified by carbodiimide activation via Oacylisourea formation 
followed by subsequent derivitisation, for example, to a corresponding amide. 



Sulphydryl groups may be modified by methods such as carboxymethylation with iodoacetic acid 
or iodoacetamide; performic acid oxidation to cysteic acid; formation of a quxed disulphides 
25 with other thiol compounds; reaction with maleimide, maleic anhydride or other substituted 
maleimide; formation of mercurial derivatives using 4-chloromercuribenzoate, 4- 
chloromercuriphenylsulphonic acid, phenylmercury chloride, 2-chloromercuri-4-nitrophenol and 
other mercurials; carbamoylation with cyanate at alkaline pH. 

30 Tryptophan residues may be modified by, for example, oxidation with N-bromosuccinimide or 
alkylation of the indole ring with 2-hydroxy-5-nitrobenzyl bromide or sulphenyl halides. 
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Tyrosine residues on the other hand, may be altered by nitration with tetranitromethane to form 
a 3-nitrotyrosine derivative. 

Modification of the imidazole ring of a histidine residue may be accomplished by alkylation with 
5 iodoacetic acid derivatives or N-carbethoxylation with diethylpyrocarbonate. 

Examples of incorporating unnatural amino acids and derivatives during peptide synthesis 
include, but are not limited to, use of norleucine, 4-amino butyric acid, 4-amino-3-hydroxy-5- 
phenylpentanoic acid, 6-aminohexanoic acid, t-butylglycine, norvaline, phenylglycine, ornithine, 
10 sarcosine, 4-amino-3-hydioxy-6-methylheptanoic acid, 2-thienyl alanine and/or D-isomers of 
amino acids. A list of unnatural amino acid, contemplated herein is shown in Table 3. 
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TABLE3 



5 



Non-conventional 
amino acid 


Code 


Non-conventional 
amino acid 


Code 


a-aminobutyric acid 


Abu 


L-N-methylalanine 


Nmala 


a-amino-cc-methylbutyrate 


Mgabu 


L-N-methylarginine 


Nmarg 


aminocyclopropane- 


Cpro 


L-N-methylasparagine 


Nmasn 


carboxylate 




L-N-methylaspartic acid 


Nmasp 


aminoisobutyric acid 


Aib 


L-N-methylcysteine 


Nmcys 


aminonoibomyl- 


Norb 


L-N-methylglutamine 


NmgLn 


carboxylate 




Lr-N-methylglutamic acid 


Nmglu 


cyclohexylalanine 




Chexa L-N-methylhistidine 


Nmhis 


cyclopentylalanine 


Cpen 


L-N-methylisolleucine 


Nmile 


D-alanine 


Dal 


L-N-methylleucine 


Nmleu 


D-arginine 


Darg 


L-N-methyllysine 


Nmlys 


D-aspartic acid 


Dasp 


L~N-methylmethionine 


Nmmet 


D-cysteine 


Dcys 


L-N-methylnorieucine 


Nmnle 


D-glutamine 


Dgln 


L-N-methylnorvaline 


Nmnva 


D-f?lutamic acid 


Delu 


L-N-methvlomithine 


Nmorn 

11J1IU1 11 


D-histidine 


Dhis 


L-N-methylphenylalanine 


Nmphe 


D-isoleucine 


Dile 


L-N-methylproline 


Nmpro 


D-leucine 


Dleu 


L-N-methylserine 


Nmser 


D-lysine 


Dlys 


L-N-methylthreonine 


Nmthr 


D-methionine 


Dmet 


L-N-methyltryptophan 


Nmtrp 


D-ornithine 


Dorn 


L-N-methyltyrosine 


Nmtyr 


D-phenylalanine 


Dphe 


l^N-methylvaline 


Nmval 


D-proline 


Dpro 


L-N-methylethylglycine 


Nmetg 


D-serine 


Dser 


L-N-methyl-t-butylglycine 


Nmtbug 


D-threonine 


Dthr 


L-norleucine 


Nle 
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D-tryptophan 


Dtrp 


D-tyrosine 


Dtyr 


D- valine 


Dval 


D- ce-methy Ialanine 


Dmala 


5 D-a-rnethylarginine 


Dmarg 


D-cc-methylasparagine 


Dmasn 


D- a-methy laspartate 


Dmasp 


D-a-methylcysteine 


Dmcys 


D-a-methylglutamine 


Dmgln 


10 D- a-methylhistidine 


Dmhis 


D-a-methylisoleucine 


Dmile 


D-a-methylleucine 


Dmleu 


D-a-methyllysine 


Dmlys 


D-a-methylmethionine 


Dmmet 


15 D-a-methylornithine 


Dmorn 


D-a-methylphenyialanine 


Dmphe 


D-a-methylproline 


Dmpro 


D- a-methy lserine 


Dmser 


D-a-methylthreonine 


Dmthr 


20 D-a-methyltryptophan 


Dmtrp 


D-a-methyltyrosine 


Dmty 


D- a-methy 1 valine 


Dmval 


D-N-methylalanine 


Dnmala 


D-N-methylarginine 


Dnmarg 


25 D-N-methylasparagine 


Dnmasn 


D-N-methylaspartate 


Dnmasp 


D-N-methylcysteine 


Dnmcys 


D-N-methylglutamine 


Dnmgln 


D-N-methylglutamate 


Dnmglu 


30 D-N-methylhistidine 


Dnmhis 


D-N-methylisoleucine 


Dnmile 
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L-norvaline 


XT- 

Nva 


a-methyl-aminoisobutyrate 


Maib 


a-methyl-Y-aminobutyrate 


Mgabu 


a-methylcyclohexylalanine 


Mchexa 


a-methylcylcopentylalanine 


Mcpen 


a-methyl-a-napthylalanine 


Manap 


a-methylpenicillamine 


Mpen 


N-(4-aminobutyl)glycine 


Nglu 


N-(2-aminoethyl)glycine 


Naeg 


N-(3-aminopropyl)glycine 


Norn 


Nramino-a-methylbutyrate 


Nmaabu 


a-napthylalanine 


Anap 


N-benzylglycine 


Nphe 


N-(2-carbamylethyl)glycine 


Ngln 


N-(carbamylmethyl)glycine 


Nasn 


N-(2-carboxyethyl)glycine 


Nglu 


N-(carboxymethyl)glycine 


Nasp 


N-cyclobutylglycine 


Ncbut 


N-cycloheptylglycine 


Nchep 


N-cyclohexylgly cine 


Nchex 


N-cyclodecylglycine 


Ncdec 


N-cylcododecylglycine 


Ncdod 


N-cyclooctylglycine 


Ncoct 


N-cyclopropylglycine 


Ncpro 


N-cycloundecylglycine 


Ncund 


N-(2,2-diphenylethyl)glycine 


Nbhm 


N-(3,3-diphenylpropyl)glycine 


Nbhe 


N-(3-guanidinopropyl)glycine 


Narg 


N-( 1 -hydroxy ethyl)glycine 


Nthr 


N-(hydroxyethyl))glycine 


Nser 


N-(imidazolylethyl))glycine 


Nhis 
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D-N-methylleucine Dnmleu 

D-N-methyllysine Dnmlys 

N-methylcyclohexylalanine Nmchexa 

D-N-niethylornithine Dnmom 

5 N-methylglycine Nala 

N-mcthylaminoisobutyrate Nmaib 

N-( 1 -methylpropyl)gIycine Nile 

N-(2-methylpropyl)glycine Nleu 

D-N-methyltryptophan Dnmtrp 

10 D-N-methyltyrosine E>nmtyr 

D-N-methylvaline Dnrnval 

y-aminobutyric acid Gabu 

L-r-butylg]ycine Tbug 

L-ethylglycine Etg 

15 L-homophenylalanine Hphe 

L-a-methylaxginine Marg 

L-a-methylaspartate Masp 

L- a-methylcy steine Mcys 

L-a-methylglutamine Mgln 

20 L-a-methylhistidine Mhis 

L-a-methylisoleueine Mile 

l^a-methylleucine Mleu 

l^a-methylmethionine Mmet 

L-a-methylnorvaline Mnva 

25 L-a-methylphenylalanine Mphe 

L- a-methylserine Mser 

L-a-methyltryptophan Mtrp 

L-a-methylvaline Mval 
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N-(3-indolylyethyl)glycine 


Nhtrp 


N -methy 1* y -amin obu ty rate 


Nmgabu 


D-N-methylmethionine 


Dnmniet 


M-nnethvlcvclooentvlalanine 


Nmmp.n 


D-N-rttethvlnhfin vial ani ne 




T^-^J-m^th vlr"tm1 in^ 


ijnixipro 






O-Nf-TTiPt h vlt hn»nn in f* 

X^^i ^ UKrlllj 11.1 11 &VlUilv 


i^niiiLiir ■ 


N-fl -methvleth vH pi vcine 


Mval 

i^i vat 


N-methvl a-nant fi vl al an t np 


Mm sin on 


N-meth vlnenici llamt ne 

A ~ 1 A AKsU IT 1111111111*1' 




N-f £>-hvdrox vnhcn vl Wl vctne 


Nhtyr 


N-^thiomethvDelvcine 


Ncys 


n&nicillarmnft 


Pen 


L- tt-methvlalanine 


IVIala 


L- cc -methy lasparagine 


Masn 


L- a-methyl-r-butylglycine 


Mtbug 


I^methylethylglycine 


Mete 


L- a-methylglutamate 


Mglu 


L-tt-nifithvlhornonhenvlalanine 

* ■» 1 B J >j ■! IVA I A\/ liH Awl 1 T 1 vl ■ lift ■ H 


lvfnnHf* 


N-(2-methylthioethyl)glycine 


Nmet 


L-a-methyDysine 


Mlys 


L-a-methylnorleucine 


Mnle 


L- a-methylornithine 


Morn 


L- a-methylproline 


Mpro 


L-ce-methylthreonine 


Mthr 


L-c-methyltyrosine 


Mtyr 


L-N-methylhoniophenylalanine 


Nmhphe 
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N-(N-(2,2-diphenylethyl) 
caibarnylmethyl)glycine 
l-carboxy-l-(2,2-diphenyl- 
ethylamino)cyclopropane 



Nnbhm 



N-(N-(3,3-diphenylpropyl) 
carbamylmethyl)glycine 



Nnbhe 



Nmbc 



5 



Crosslinkers can be used, for example, to stabilise 3D conformations, using homo-bifunctional 
crosslinkers such as the bifunctional imido esters having (CH^a spacer groups with n=l to n=6, 
glutaraldehyde, N-hydroxysuccinimide esters and hetero-bifunctional reagents which usually 

10 contain an amino-reactive moiety such as N-hydroxysuccinimide and another group specific- 
reactive moiety such as maleimido or dithio moiety (SH) or carbodiimide (COOH). In addition, 
peptides can be conformationally constrained by, for example, incorporation of C a and r^- 
methylamino acids, introduction of double bonds between C a and C p atoms of amino acids and 
the formation of cyclic peptides or analogues by introducing covalent bonds such as forming 

15 an amide bond between the N and C termini, between two side chains or between a side chain 
and the N or C terminus. 

These types of modifications may be important to stabilise the cytokines if administered to an 
individual or for use as a diagnostic reagent. 



Other derivatives contemplated by the present invention include a range of glycosylation 
variants from a completely unglycosylated molecule to a modified glycosylated molecule. 
Altered glycosylation patterns may result from expression of recombinant molecules in different 
host cells. 



Another embodiment of the present invention contemplates a method for modulating 
expression of a SOCS protein in a mammal, said method comprising contacting a gene encoding 
a SOCS or a factor/element involved in controlling expression of the SOCS gene with an 
effective amount of a modulator of SOCS expression for a time and under conditions sufficient 
30 to up-regulate or down-regulate or otherwise modulate expression of SOCS. An example of 
a modulator is a cytokine such as JL-6 or other transcription regulators of SOCS expression. 



20 



25 
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Expression includes transcription or translation or both. 

Another aspect of the present invention contemplates a method of modulating activity of SOCS 
in a human, said method comprising administering to said mammal a modulating effective 
5 amount of a molecule for a time and under conditions sufficient to increase or decrease SOCS 
activity. The molecule may be a proteinaceous molecule or a chemical entity and may also be 
a derivative of SOCS or a chemical analogue or truncation mutant of SOCS. 

A further aspect of the present invention provides a method of inducing synthesis of a SOCS 
10 or transcription/translation of a SOCS comprising contacting a cell containing a SOCS gene 
with an effective amount of a cytokine capable of inducing said SOCS for a time and under 
- conditions sufficient for said SOCS to be produced. For example, SOCS1 may be induced by 
DL-6. 

15 Still a further aspect of the present invention contemplates a method of modulating levels of a . 
SOCS protein in a cell said method comprising contacting a cell containing a SOCS gene with 
an effective amount of a modulator of SOCS gene expression or SOCS protein activity for a 
time and under conditions sufficient to modulate levels of said SOCS protein. 

20 Yet a further aspect of the present invention contemplates a method of modulating signal 
transduction in a ceD containing a SOCS gene comprising contacting said cell with an effective 
amount of a modulator of SOCS gene expression or SOCS protein activity for a time sufficient 
to modulate signal transduction. 

25 Even yet a further aspect of the present invention contemplates a method of influencing 
interaction between cells wherein at least one cell carries a SOCS gene, said method comprising 
contacting the cell carrying the SOCS gene with an effective amount of a modulator of SOCS 
gene expression or SOCS protein activity for a time sufficient to modulate signal transduction. 

30 As stated above, of the present invention contemplates a range of mimetics or small molecules 
capable of acting as agonists or antagonists of the SOCS. Such molecules may be obtained 
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from natural product screening such as from coral, soil, plants or the ocean or antarctic 
environments. Alternatively, peptide, polypeptide or protein libraries or chemical libraries may 
be readily screened. For example, Ml cells expressing a SOCS do not undergo differentiation 
in the presence of IL-6. This system can be used to screen molecules which permit 
5 differentiation in the presence of IL-6 and a SOCS. A range of test cells may be prepared to 
screen for antagonists and agonists for a range of cytokines. Such molecules are preferably 
small molecules and may be of amino acid origin or of chemical origin. SOCS molecules 
interacting with signalling proteins (eg. JAKS) provide molecular screens to detect molecules 
which interfere or promote this interaction. Once such screening protocol involves natural 
10 product screening. 

Accordingly, the present invention contemplates a pharmaceutical composition comprising 
SOCS or a derivative thereof or a modulator of SOCS expression or SOCS activity and one or 
more pharmaceutically acceptable carriers and/or diluents. These components are referred to 
15 as the "active ingredients". These and other aspects of the present invention apply to any SOCS 
molecules such as but not limited to SOCS 1 to SOCS 1 5. 

The pharmaceutical forms containing active ingredients suitable for injectable use include sterile 
aqueous solutions (where water soluble) sterile powders for the extemporaneous preparation 

20 of sterile injectable solutions. It must be stable under the conditions of manufacture and storage 
and must be preserved against the contaminating action of microorganisms such as bacteria and 
fungi. The carrier can be a solvent or dispersion medium containing, for example, water, 
ethanol, polyol (for example, glycerol, propylene glycol and liquid polyethylene glycol, and the 
like), suitable mixtures thereof, and vegetable oils. The proper fluidity can be maintained, for 

25 example, by the use of a coating such as licithin, by the maintenance of the required particle size 
in the case of dispersion and by the use of superfactants. The preventions of the action of 
microorganisms can be brought about by various antibacterial and antifungal agents, for 
example, parabens, chlorobutanol, phenol, sorbic acid, thirmerosal and the like. In many cases, 
it will be preferable to include isotonic agents, for example, sugars or sodium chloride. 

30 Prolonged absorption of the injectable compositions can be brought about by the use in the 
compositions of agents delaying absorption, for example, aluminum monostearate and gelatin. 
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. Sterile injectable solutions are prepared by incorporating the active compounds in the required 
amount in the appropriate solvent with various of the other ingredients enumerated above, as 
required, followed by filtered sterilization. In the case of sterile powders for the preparation 
of sterile injectable solutions, the preferred methods of preparation are vacuum drying and the 

5 freeze-drying technique which yield a powder of the active ingredient plus any additional 
desired ingredient from previously sterile-filtered solution thereof. 

When the active ingredients axe suitably protected they may be orally administered, for example, 
with an inert diluent or with an assimilable edible carrier, or it may be enclosed in hard or soft 

10 shell gelatin capsule, or it may be compressed into tablets. For oral therapeutic administration, 
the active compound may be incorporated with excipients and used in the form of ingestible 
tablets, buccal tablets, troches, capsules, elixirs, suspensions, syrups, wafers and the like. Such 
compositions and preparations should contain at least 1 % by weight of active compound. The 
percentage of the compositions and preparations may, of course, be varied and may 

15 conveniently be between about 5 to about 80% of the weight of the unit The amount of active 
compound in such therapeutically useful compositions in such that a suitable dosage will be 
obtained. Preferred compositions or preparations according to the present invention are 
prepared so that an oral dosage unit form contains between about 0. 1 Mg and 2000 mg of active 
compound. 



The tablets, troches, pills, capsules and the like may also contain the components as listed 
hereafter. A binder such as gum, acacia, corn starch or gelatin; excipients such as dicalcium 
phosphate; a disintegrating agent such as corn starch, potato starch, alginic acid and the like; 
a lubricant such as magnesium stearate; and a sweetening agent such a sucrose, lactose or 

25 saccharin may be added or a flavouring agent such as peppermint, oil of wintergreen or cherry 
flavouring. When the dosage unit form is a capsule, it may contain, in addition to materials of 
the above type, a liquid carrier. Various other materials may be present as coatings or to 
otherwise modify the physical form of the dosage unit. For instance, tablets, pills, or capsules 
may be coated with shellac, sugar or both. A syrup or elixir may contain the active compound, 

30 sucrose as a sweetening agent, methyl and propylparabens as preservatives, a dye and 
flavouring such as cherry or orange flavour. Of course, any material used in preparing any 



20 
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; dosage unit fonn should be pharmaceutically pure and substantially non-toxic in the amounts 
employed. In addition, the active compound(s) may be incorporated into sustained-release 
preparations and formulations. 

5 The present invention also extends to forms suitable for topical application such as creams, 
lotions and gels. 

Pharmaceutically acceptable carriers and/or diluents include any and all solvents, dispersion 
media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents and 
10 the like. The use of such-media and agents for pharmaceutical active substances is well known 
in the art. Except insofar as any conventional media or agent is incompatible with the active 
ingredient, use thereof in the therapeutic compositions is contemplated. Supplementary active 
ingredients can also be incorporated into the compositions. 

15 It is especially advantageous to formulate parenteral compositions in dosage unit form for ease 
of administration and uniformity of dosage. Dosage unit form as used herein refers to physically 
discrete units suited as unitary dosages for the mammalian subjects to be treated; each unit 
containing a predetermined quantity of active material calculated to produce the desired 
therapeutic effect in association with the required pharmaceutical carrier. The specification for 

20 the novel dosage unit forms of the invention are dictated by and directly dependent on (a) the 
unique characteristics of the active material and the particular therapeutic effect to be achieved, 
and^(b) the limitations inherent in the art of compounding such an active material for the 
treatment of disease in living subjects having a diseased condition in which bodily health is 
impaired as herein disclosed in detail. 

25 

The principal active ingredient is compounded for convenient and effective administration in 
effective amounts with a suitable pharmaceutically acceptable carrier in dosage unit form as 
hereinbefore disclosed. A unit dosage form can, for example, contain the principal active 
compound in amounts ranging from 0.5 \xg to about 2000 mg. Expressed in proportions, the 
30 active compound is generally present in from about 0.5 \xg to about 2000 mg/ml of carrier. In 
the case of compositions containing supplementary active ingredients, the dosages are 
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determined by reference to the usual dose and manner of administration of the said ingredients. 
The effective amount may also be conveniently expressed in terms of an amount per kg of body 
weight. For example, from about 0.01 ng to about 10,000 mg/kg body weight may be 
administered. 

5 

The pharmaceutical composition may also comprise genetic molecules such as a vector capable 
of transfecting target cells where the vector carries a nucleic acid molecule capable of 
modulating SOCS expression or SOCS activity. The vector may, for example, be a viral vector. 
In this regard, a range of gene therapies are contemplated by the present invention including 
10 isolating certain cells, genetically manipulating and returning the cell to the same subject or to 
a genetically related or similar subject 

Still another aspect of the present invention is directed to antibodies to SOCS and its 
derivatives. Such antibodies may be monoclonal or polyclonal and may be selected from 
15 naturally occurring antibodies to SOCS or may be specifically raised to SOCS or derivatives 
thereof. In the case of the latter, SOCS or its derivatives may first need to be associated with 
a carrier molecule. The antibodies and/or recombinant SOCS or its derivatives of the present 
invention are particularly useful as therapeutic or diagnostic agents. 

20 For example, SOCS and its derivatives can be used to screen for naturally occurring antibodies 
to SOCS. These may occur, for example in some autoimmune diseases. Alternatively, specific 
antibodies can be used to screen for SOCS. Techniques for such assays are well known in the 
art and include, for example, sandwich assays and ELIS A. Knowledge of SOCS levels may be 
important for diagnosis of certain cancers or a predisposition to cancers or monitoring cytokine 

25 mediated cellular responsiveness or for monitoring certain therapeutic protocols. 

Antibodies to SOCS of the present invention may be monoclonal or polyclonal. Alternatively, 
fragments of antibodies may be used such as Fab fragments. Furthermore, the present invention 
extends to recombinant and synthetic antibodies and to antibody hybrids. A "synthetic 
30 antibody" is considered herein to include fragments and hybrids of antibodies. The antibodies 
of this aspect of the present invention are particularly useful for immunotherapy and may also 
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be used as a diagnostic tool for assessing apoptosis or monitoring the program of a therapeutic 
regimin. 

For example, specific antibodies can be used to screen for SOCS proteins. The latter would be 
5 important, for example, as a means for screening for levels of SOCS in a cell extract or other 
biological fluid or purifying SOCS made by recombinant means from culture supernatant fluid. 
Techniques for the assays contemplated herein are known in the art and include, for example, 
sandwich assays and ELISA. 

10 It is within the scope of this invention to include any second antibodies (monoclonal, polyclonal 
or fragments of antibodies or synthetic antibodies) directed to the first mentioned antibodies 
discussed above. Both the first and second antibodies may be used in detection assays or a first 
antibody may be used with a commercially available anti-immunoglobulin antibody. An 
antibody as contemplated herein includes any antibody specific to any region of SOCS. 

15 

Both polyclonal and monoclonal antibodies are obtainable by immunization with the enzyme 
or protein and either type is utilizable for immunoassays. The methods of obtaining both types 
of sera are well known in the art. Polyclonal sera are less preferred but are relatively easily 
prepared by injection of a suitable laboratory animal with an effective amount of SOCS, or 
20 antigenic parts thereof, collecting serum from the animal, and isolating specific sera by any of 
the^known immunoadsorbent techniques. Although antibodies produced by this method are 
utilizable in virtually any type of immunoassay, they are generally less favoured because of the 
potential heterogeneity of the product. 

25 The use of monoclonal antibodies in an immunoassay is particularly preferred because of the 
ability to produce them in large quantities and the homogeneity of the product. The preparation 
of hybridoma cell lines for monoclonal antibody production derived by fusing an immortal cell 
line and lymphocytes sensitized against the immunogenic preparation can be done by techniques 
which are well known to those who are skilled in the art 

30 

Another aspect of the present invention contemplates a method for detecting SOCS in a 
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. biological sample from a subject said method comprising contacting said biological sample with 
an antibody specific for SOCS or its derivatives or homologies for a time and under conditions 
sufficient for an antibody-SOCS complex to form and then detecting said complex. 

5 The presence of SOCS may be accomplished in a number of ways such as by Western blotting 
and ELIS A procedures. A wide range of immunoassay techniques are available as can be seen 
by reference to US Patent Nos. 4,016,043, 4, 424,279 and 4,018,653. These, of course, include 
both single-site and two-site or "sandwich" assays of the non-competitive types, as well as in 
the traditional competitive binding assays. These assays also include direct binding of a labelled 
10 antibody to a target 

Sandwich assays are among the most useful and commonly used assays and are favoured for 
use in the present invention. A number of variations of the sandwich assay technique exist, and 
all are intended to be encompassed by the present invention. Briefly, in a typical forward assay, 

1 5 an unlabeQed antibody is immobilized on a solid substrate and the sample to be tested brought 
into contact with the bound molecule. After a suitable period of incubation, for a period of time 
sufficient to allow formation of an antibody-antigen complex, a second antibody specific to the 
antigen, labelled with a reporter molecule capable of producing a detectable signal is then added 
and incubated, allowing time sufficient for the formation of another complex of antibody- 

20 antigen-labelled antibody. Any unreacted material is washed away, and the presence of the 
antigen is determined by observation of a signal produced by the reporter molecule. The results 
may either be qualitative, by simple observation of the visible signal, or may be quantitated by 
comparing with a control sample containing known amounts of hapten. Variations on the 
forward assay include a simultaneous assay, in which both sample and labelled antibody are 

25 added simultaneously to the bound antibody. These techniques are well known to those skilled 
in the art, including any minor variations as will be readily apparent In accordance with the 
present invention the sample is one which might contain SOCS including cell extract, tissue 
biopsy or possibly serum, saliva, mucosal secretions, lymph, tissue fluid and respiratory fluid. 
The sample is, therefore, generally a biological sample comprising biological fluid but also 

30 extends to fermentation fluid and supernatant fluid such as from a cell culture. 
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In the typical forward sandwich assay, a first antibody having specificity for the SOCS or 
antigenic parts thereof, is either covalently or passively bound to a solid surface. The solid 
surface Is typically glass or a polymer, the most commonly used polymers being cellulose, 
polyacrylamide, nylon, polystyrene, polyvinyl chloride or polypropylene. The solid supports 
5 may be in the form of tubes, beads, discs of microplates, or any other surface suitable for 
conducting an immunoassay. The binding processes are well-known in the art and generally 
consist of cross-linking covalently binding or physically adsorbing, the polymer-antibody 
complex is washed in preparation for the test sample. An aliquot of the sample to be tested is 
then added to the solid phase complex and incubated for a period of time sufficient (e.g. 2-40 
10 minutes or overnight if more convenient) and under suitable conditions (e.g. room temperature 
to 37°C) to allow binding of any subunit present in the antibody. Following the incubation 
period, the antibody subunit solid phase is washed and dried and incubated with a second 
antibody specific for a portion of the hapten. The second antibody is linked to a reporter 
molecule which is used to indicate the binding of the second antibody to the hapten. 

15 

An alternative method involves immobilizing the target molecules in the biological sample and 
then exposing the immobilized target to specific antibody which may or may not be labelled 
with a reporter molecule. Depending on the amount of target and the strength of the reporter 
molecule signal, a bound target may be detectable by direct labelling with the antibody. 
20 Alternatively, a second labelled antibody, specific to the first antibody is exposed to the target- 
first antibody complex to form a target-first antibody-second antibody tertiary complex. The 
complex is detected by the signal emitted by the reporter molecule. 

By "reporter molecule" as used in the present specification, is meant a molecule which, by its 
25 chemical nature, provides an analytically identifiable signal which allows the detection of 
antigen-bound antibody. Detection may be either qualitative or quantitative. The most 
commonly used reporter molecules in this type of assay are either enzymes, fluorophores or 
radionuclide containing molecules (i.e. radioisotopes) and chemiluminescent molecules. 

30 In the case of an enzyme immunoassay, an enzyme is conjugated to the second antibody, 
generally by means of glutaraldehyde or periodate. As will be readily recognized, however, a 

SUBSTITUTE SHEET (Rule 26) 



00:56:51 



m m 

WO 98/20023 PCT/AU97/00729 

-59- 

wide variety of different conjugation techniques exist, which are readily available to the skilled 
artisan- Commonly used enzymes include horseradish peroxidase, glucose oxidase, beta- 
galactosidase and alkaline phosphatase, amongst others. The substrates to be used with the 
specific enzymes are generally chosen for the production, upon hydrolysis by the corresponding 
5 enzyme, of a detectable colour change. Examples of suitable enzymes include alkaline 
phosphatase and peroxidase. It is also possible to employ fluorogenic substrates, which yield 
a fluorescent product rather than the chromogenic substrates noted above. In all cases, the 
enzyme-labelled antibody is added to the first antibody hapten complex, allowed to bind, and 
then the excess reagent is washed away. A solution containing the appropriate substrate is then 
10 added to the complex of antibody-antigen-antibody , The substrate will react with the enzyme 
linked to the second antibody, giving a qualitative visual signal, which may be further 
quantitated, usually spectrophotoinetricalty, to give an indication of the amount of hapten which 
was present in the sample. "Reporter molecule" also extends to use of cell agglutination or 
inhibition of agglutination such as red blood cells on latex beads, and the like. 

15 

Alternately, fluorescent compounds, such as fluorescein and rhodamine, may be chemically 
coupled to antibodies without altering their binding capacity. When activated by illumination 
with light of a particular wavelength, the fluorochrome-labelled antibody adsorbs the light 
energy, inducing a state to excitability in the molecule, followed by emission of the light at a 

20 characteristic colour visually detectable with a light microscope. As in the EIA, the fluorescent 
labelled antibody is allowed to bind to the first antibody-hapten complex. After washing off the 
unbound reagent, the remaining tertiary complex is then exposed to the light of the appropriate 
wavelength the fluorescence observed indicates the presence of the hapten of interest. 
Immunofluorescene and EIA techniques are both very well established in the art and are 

25 particularly preferred for the present method. However, other reporter molecules, such as 
radioisotope, chemiluminescent or bioluminescent molecules, may also be employed. 

The present invention also contemplates genetic assays such as involving PCR analysis to detect 
SOCS gene or its derivatives. Alternative methods or methods used in conjunction include 
30 direct nucleotide sequencing or mutation scanning such as single stranded conformation 
polymorphisms analysis (SSCP) as specific oligonucleotide hybridisation, as methods such as 
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direct protein truncation tests. 

Since cytokines are involved in transcription of some SOCS molecules, the detection of SOCS 



rheumatoid arthritis, diabetes and stiff man syndrome amongst others. 

The nucleic acid molecules of the present invention may be DNA or RNA. When the nucleic 
acid molecule is in DNA form, it may be genomic DNA or cDNA. RNA forms of the nucleic 
10 acid molecules of the present invention are generally mRN A. 

Although the nucleic acid molecules of the present invention are generally in isolated form, they 
may be integrated into or ligated to or otherwise fused or associated with other genetic 
molecules such as vector molecules and in particular expression vector molecules. Vectors and 
15 expression vectors are generally capable of replication and, if applicable, expression in one or 
both of a prokaryotic cell or a eukaryotic cell. Preferably, prokaryotic cells include E. coli t 
Bacillus sp and Pseudomonas sp. Preferred eukaryotic cells include yeast, fungal, mammalian 
and insect cells. 

20 Accordingly, another aspect of the present invention contemplates a genetic construct 
comprising a vector portion and a mammalian and more particularly a human SOCS gene 
portion, which SOCS gene portion is capable of encoding a SOCS polypeptide or a functional 
or immunologically interactive derivative thereof. 

25 Preferably, the SOCS gene portion of the genetic construct is operably linked to a promoter on 
the vector such that said promoter is capable of directing expression of said SOCS gene portion 
in an appropriate cell. 

In addition, the SOCS gene portion of the genetic construct may comprise all or part of the 
30 gene fused to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 



5 



provides surrogate markers for cytokines or cytokine activity. This may be useful in assessing 
subjects with a range of conditions such as those will autoimmune diseases, for example, 
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The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
comprising same. 

The present invention also extends to any or all derivatives of SOCS including mutants, part, 
5 fragments, portions, homologues and analogues or their encoding genetic sequence including 
single or multiple nucleotide or amino acid substitutions, additions and/or deletions to the 
naturally occurring nucleotide or amino acid sequence. The present invention also extends to 
mimetics and agonists and antagonists of SOCS. 

10 The SOCS and its genetic sequence of the present invention will be useful in the generation of 
a range of therapeutic and diagnostic reagents and will be especially useful in the detection of 
a cytokine involved in a particular cellular response or a receptor for that cytokine. For 
example, cells expressing SOCS gene such as Ml cells expressing the SOCS1 gene, will no 
longer be responsive to a particular cytokine such as, in the case of SOCS 1, TL-6. Clearly, the 

15 present invention further contemplates cells such as Ml cells expressing any SOCS gene such 
as from SOCS 1 to SOCS 15. Furthermore, the present invention provides the use of molecules 
that regulate or potentiate the ability of therapeutic cytokines. For example, molecules which 
block some SOCS activity, may act to potential therapeutic cytokine activity (eg. G-CSF). 

20 Soluble SOCS polypeptides are also contemplated to be particularly useful in the treatment of 
disease, injury or abnormality involving cytokine mediated cellular responsiveness such as 
hyperimmunity, immunosuppression, allergies, hypertension and the like, 

A further aspect of the present invention contemplates the use of SOCS or its functional 
25 derivatives in the manufacture of a medicament for the treatment of conditions involving 
cytokine mediated cellular responsiveness. 

The present invention further contemplates transgenic mammalian cells expressing a SOCS 
gene. Such cells are useful indicator cell lines for assaying for suppression of cytokine function. 
30 One example is Ml cells expressing a SOCS gene. Such cell lines may be useful for screening 
for cytokines or screening molecules such as naturally occurring molecules from plants, coral. 
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microorganisms or bio-organically active soil or water capable of acting as cytokine antagonists 
or agonists. 

The present invention further contemplates hybrids between different SOCS from the same or 
5 different animal species. For example, a hybrid may be formed between all or a functional part 
of mouse SOCS1 and human SOCS1. Alternatively, the hybrid may be between all or part of 
mouse SOCS1 and mouse SOCS2. All such hybrids are contemplated herein and are 
particularly useful in developing pleiotropic molecules. 

10 The present invention further contemplates a range of genetic based diagnostic assays screening 
for individuals with defective SOCS genes. Such mutations may result in cell types not being 
responsive to a particular cytokine or resulting in over responsiveness leading to a range of 
conditions. The SOCS genetic sequence can be readily verified using a range of PCR or other 
techniques to determine whether a mutation is resident in the gene. Appropriate gene therapy 

15 or other interventionist therapy may then be adopted. 

The present invention is further described by the following non-limiting Examples. 
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Examples 1-16 relate to SOCS1, SOCS2 and SOCS3 which were identified on the basis of 
activity. Examples 17-24 relate to various aspects of SOCS4 to SOCS15 which were cloned 
initially on the basis of sequence similarity. Examples 25-36 relate to specific aspects of SOCS4 
to SOCS15, respectively. 
5 EXAMPLE 1 

CELL CULTURE AND CYTOKINES 
The Ml cell line was derived from a spontaneously arising leukaemia in SL mice [Ichikawa, 
1969]. Parental Ml cells used in this study have been in passage at the Walter and Eliza Hall 
Institute for Medical Research, Melbourne, Victoria, Australia, for approximately 10 years. Ml 

10 cells were maintained by weekly passage in Dulbecco's modified Eagle's medium (DME) 
containing 10% (v/v) foetal bovine serum (FCS). Recombinant cytokines are generally 
available from commercial sources or were prepared by published methods. Recombinant 
murine LJF was produced in Escherichia coli and purified, as previously described [Gearing, 
1989). Purified human oncostatin M was purchased from PeproTech Inc (Rocky Hill, NJ, 

15 USA), and purified mouse IFN-y was obtained from Genzyme Diagnostics (Cambridge, MA, 
USA). Recombinant murine thrombopoietin was produced as a FLAGTM-tagged fusion 
protein in CHO cells and then purified. 

EXAMPLE 2 

20 AGAR COLONY ASSAYS 

In order to assay the differentiation of Ml cells in response to cytokines, 300 cells were 
cultured in 35 mm Petri dishes containing 1 ml of DME supplemented with 20%(v/v) fltal calf 
serum (FCS), 0.3%(w/v) agar and 0.1 ml of serial dilutions of IL-6, LIF, OSM, IFN-y, tpo or 
dexamethasone (Sigma Chemical Company, St Louis, MI). After 7 days culture at 37° C in a 

25 fully humidified atmosphere, containing 10% (v/v) C0 2 in air, colonies of Ml cells were 
counted and classified as differentiated if they were composed of dispersed cells or had a corona 
of dispersed cells around a tightly packed centre. 

EXAMPLE 3 

30 GENERATION OF RETROVIRAL LIBRARY 

A cDNA expression library was constructed from the factor-dependent haemopoietic cell line 
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FDC-P1, essentially as described [Rayner, 1994]. Briefly, cDNA was cloned into the retroviral 
vector pRUFneo and then transfected into an amphotrophic packaging cell line (PA317). 
Transiently generated virus was harvested from the cell supernatant at 48 hr posttransfection, 
and used to infect Y2 ecotropic packaging cells, to generate a high titre virus-producing cell 
5 line. 

EXAMPLE 4 
RETROVIRAL INFECTION OF Ml CELLS 

Pools of 10 6 infected T2 cells were irradiated (3000 rad) and cocultivated with 10* Ml cells 
10 in DME supplemented with 10%(v/v) FCS and 4 ng/ml Polybrene, for 2 days at 37°C. To 
select for IL-6-unresponsive clones, retrovirally-infected Ml cells were washed once in DME, 
and cultured at approximately 2xl0 4 cells/ml in 1 ml agar cultures containing 400 jig/ml 
geneticin (GibcoBRL, Grand Island, NY) and 100 ng/ml IL-6. The efficiency of infection of 
Ml cells was 1-2%, as estimated by agar plating the infected cells in the presence of geneticin 
15 only. 

EXAMPLE 5 
PCR 

Genomic DNA from retrovirally-infected Ml cells was digested with Sac I and 1 ^ig of 
20 phenol/chloroform extracted DNA was then amplified by polymerase chain reaction (PCR). 
Primers used for amplification of cDNA inserts from the integrated retrovirus were GAG3 (5* 
CACGCCGCCCACGTGAAGGC 3' [SEQ ID NO: 1]), which corresponds to the vector gag 
sequence approximately 30 bp 5' of the multiple cloning site, and HSVTK (5' 
TTCGCCAATGACAAGACGCT 3' [SEQ ID NO:2]), which corresponds to the pMClneo 
25 sequence approximately 200 bp 3' of the multiple cloning site. The PCR entailed an initial 
denaturation at 94°C for 5 min, 35 cycles of denaturation at 94°C for 1 min, annealing at 56°C 
for 2 min, and extension at 72°C for 3 min, followed by a final 10 min extension. PCR products 
were gel purified and then ligated into the pGEM-T plasmid (Promega, Madison, WI), and 
sequenced using an ABI PRISM Dye Terminator Cycle Sequencing Kit and a Model 373 
30 Automated DNA Sequencer (Applied Biosystems Inc., Foster City, CA). 
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EXAMPLE 6 



CLONING OF cDNAs 



Independent cDNA clones encoding mouse SOCS1 were isolated from a murine thymus cDNA 
Kbrary essentially as described (Hilton et aU 1994). The nucleotide and predicted amino acid 
5 sequences of mouse SOCS1 cDNA were compared to databases using the BLASTN and 
TFASTA algorithms (Pearson and Lipman, 1988; Pearson, 1990; Altshcul et al t 1990). 
Oligonucleotides were designed from the ESTs encoding human SOCS 1 and mouse SOC- 1 and 
SOCS3 and used to probe commercially available mouse thymus and spleen cDNA libraries. 
Sequencing was performed using an ABI automated sequencer according to the manufacturer's 
10 instructions. 



SOUTHERN AND NORTHERN BLOT ANALYSES AND RT-PCR 

^P-labelled probes were generated using a random decanucleotide labelling kit (Bresatec, 
15 Adelaide, South Australia) from a 600 bp Pst I fragment encoding neomycin phophotransfease 
from the plasrnid pPGKneo, 1070 bp fragment of the SOCS1 gene obtained by digestion of the 
1.4 kbp PCR product with Xho I, SOCS2, SOCS3, CIS and a 1 .2 kbp fragment of the chicken 
glyceraldehyde 3-phosphate dehydrogenase gene [Dugaiczyk, 1983 J. 

20 Genomic DN A was isolated from cells using a proteinase K-sodiurn dodecyl sulfate procedure 
essentially as described. Fifteen micrograms of DNA was digested with either BamH I or Sac 
I, fractionated on a 0.8%(w/v) agarose gel, transferred to GeneScreenPlus membrane (Du Pont 
NEN, Boston MA), prehybridised, hybridised with random-primed ^P-labelled DNA fragments 
and washed essentially as described [Sambrook, 1989], 



Total RNA was isolated from cells and tissues using Trizol Reagent, as recommended by the 
manufacturer (GibcoBRL,Grand Island, NY). When required polyA+ mRNA was purified 
essentially as described [Alexander, 1995]. Northern blots were prehybridised, hybridized with 
random-primed 32P-labelled DNA fragments and washed as described [Alexander, 1995]. 



To assess the induction of SOCS genes by IL-6, mice (C57BL6) were injected intravenously 



EXAMPLE 7 



25 



30 
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.with 5 >ug IL-6 followed by harvest of the liver at the indicated timepoints after injection. Ml 
cells were cultured in the presence of 20 ng/ml IL-6 and harvested at the indicated times. For 
RT-PCR analysis, bone marrow cells were harvested as described (Metacalf et aU 1995) and 
stimulated for 1 hr at 37°C with 100 ng/ml of a range of cytokines. RT-PCR was performed 
5 on total RNA as described (Metcalf et aU 1995). PCR products were resolved on an agarose 
gel and Southern blots were hybridised with probes specific for each SOCS family member. 
Expression of P-actin was assessed to ensure uniformity of amplification. 

EXAMPLE 8 

10 DNA CONSTRUCTS AND TRANSFECTION 

A cDNA encoding epitope-tagged SOCS 1 was generated by subcloning the entire SOCS1 
coding region into the pEF-BOS expression vector [Mizushima, 1990], engineered to encode 
an inframe FLAG epitope downstream of an initiation methionine (pF-SOCSl). Using 
electroporation as described previously [Hilton, 1994], Ml cells expressing the thrombopoietin 

1 5 receptor (M 1 .mpl) were transfected with the 20 //g of Aat Il-digested pF-SOCS 1 expression 
plasmid and 2 Mg of a Sea I-digested plasmid in which transcription of a cDNA encoding 
puromycin N-acetyl transferase was driven from the mouse phosphoglycerokinase promoter 
(pPGKPuropA). After 48 hours in culture, transfected cells were selected with 20 jag/ml 
puromycin (Sigma Chemical Company, St Louis MO), and screened for expression of SOCS 1 

20 by Western blotting, using the M2 anti-FLAG monoclonal antibody according to the 
manafecturer*s instructions (Eastman Kodak, Rochester NY). In other experiments Ml cells 
were transfected with only the pF-SOCSl plasmid or a control and selected by their ability to 
grow in agar in the presence of 100 ng/ml of IL-6. 

25 
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EXAMPLE 9 



IMMUNOPRECIPITATION AND WESTERN BLOTTING 



Prior to either immunoprecipitaion or Western blotting, 10 7 Ml cells or their derivatives were 
washed twice, resuspended in 1ml of DME, and incubated at 37°C for 30 min. The cells were 
5 then stimulated for 4 min at 37 °C with either saline or 100 ng/ml IL-6, after which sodium 
vanadate (Sigma Chemical Co., St Louis, MI) was added to a concentration of 1 mM. Cells 
were placed on ice, washed once with saline containing 1 mM sodium vanadate, and then 
solubflised for 5 min on ice with 300 pi 1% (v/v) Triton X-100, 150 mM NaCl, 2 mM EDTA, 
50 mM Txis-HCl pH 7.4, containing Complete protease inhibitors (Boehringer Mannheim, 
10 Mannheim, Germany) and 1 mM sodium vanadate. Lysates were cleared by centrifugation and 
quantitated using a Coomassie Protein Assay Reagent (Pierce, Rockford IL). 

For immunoprecipitations, equal concentrations of protein extracts (1-2 mg) were incubated 
for 1 hr or overnight at 4°C with either 4 pg of anti-gp!30 antibody (M20; Santa Cruz 

15 Biotechnology Inc., Santa Cruz, CA) or 4 pg of anti-phosphotyrosine antibody (4G10; Upstate 
Biotechnology Inc., Lake Placid NY), and 15 pi packed volume of Protein G Sepharose 
(Pharmacia, Uppsala, Sweden) [Hilton et aU 1996], Immunoprecipitates were washed twice 
in 1% (v/v) NP40, 150 mM NaCl , 50 mM Tris-HCl pH 8.0, containing Complete protease 
inhibitors (Boehringer Mannheim, Mannheim, Germany and 1 mM sodium vanadate. The 

20 samples were heated for 5 min at 95 °C in SDS sample buffer (625 mM Tris-HCl pH 6.8, 0.05% 
(w/v) SDS, 0.1% (v/v) glycerol, bromophenol blue, 0.125% (v/v) 2-mercaptoethanol), 
fractionated by SDS-PAGE and immunoblotted as described above. 

For Western blotting, 10 pg of protein from a cellular extract or material from an 
25 immunoprecipitation reaction was loaded onto 4-15% Ready gels (Bio-Rad Laboratories, 
Hercules CA), and resolved by sodium dodecyl sulfate polyacrylamide gel electrophoresis 
(SDS-PAGE). Proteins were transferred to PVDF membrane (Micron Separations Inc., 
Westborough MA) for 1 hr at 100 V. The membranes were probed with the following primary 
antibodies; anti-tyrosine phosphorylated STAT3 (1:1000 dilution; New England Biolabs, 
30 Beverly, MA); anti-STAT3 (C-20; 1:100 dilution; Santa Cruz Biotechnology Inc., Santa Cruz 
CA); anti-gpl30 (M20, 1 : 100 dilution; Santa Cruz Biotechnology Inc., Santa Cruz CA); anti- 
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phosphotyrosine (horseradish peroxidase-conjugated RC20, 1:5000 dilution; Transduction 
Laboratories, Lexington KY); anti-tyrosine phosphorylated MAP kinase and anti-MAP kinase 
antibodies (1:1000 dilution; New England Biolabs, Beverly, MA). Blots were visualised using 
peroxidase-conjugated secondary antibodies and Enhanced Chemiluminescence (ECL) reagents 
5 according to the manafacturer's instructions (Pierce, Rockford IL). 



EXAMPLE 10 
ELECTROPHORETIC MOBILITY SHIFT ASSAYS 
Assays were performed as described [Novak, 1995], using the high affinity SIF (c-sis- inducible 

10, factor) binding site m67 [Wakao, 1994], Protein extracts were prepared from Ml cells 
incubated for 4-10 min at 37°C in 10 ml serum-free DME containing either saline, 100 ng/ml 
IL-6 or 100 ng/ml IFN-y. The binding reactions contained 4-6 fig protein (constant within a 
given experiment), 5 ng 32 P-labelled m67 oligonucleotide, and 800 ng sonicated salmon sperm 
DNA For certain experiments, protein samples were preincubated with an excess of unlabelled 

15 m67 oligonucleotide, or antibodies specific for either STAT1 (Transduction Laboratories, 
Lexington, KY) or STAT3 (Santa Cruz Biotechnology Inc., Santa Cruz CA), as described 
[Novak, 1995]. 

. Western blots were performed using anti-tyrosine phosphorylated STAT3 or anti-STAT3 (New 
20 England Biolabs, Beverly, MA) or anti-gpl30 (Santa Cruz Biotechnology Inc.) as described 
(Nicola et al, 1996). EMSA were performed using the m67 oligonucleotide probe, as described 
(Novak et aU 1995). 
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EXAMPLE 11 

EXPRESSION CLONING OF A NOVEL SUPPRESSOR OF 
CYTOKINE SIGNAL TRANSDUCTION 
In order to identify cDNAs capable of suppressing cytokine signal transduction, an expression 
5 cloning approach was adopted. This strategy centred on Ml cells, a monocytic leukaemia cell 
line that differentiates into mature macrophages and ceases proliferation in response to the 
cytokines IL-6, UF, OSM and IFN-y, and the steroid dexamethasone. Parental Ml cells were 
infected with the RUFneo retrovirus, into which cDNAs from the factor-dependent 
haemopoietic cell line FDC-P1 had been cloned. In this retrovirus, transcription of both the 

10 neomycin resistance gene and the cloned cDNA was driven off the powerful constitutive 
promoter present in the retroviral LTR (Figure 1). When cultured in semi-solid agar, parental 
Ml cells form large tightly packed colonies. Upon stimulation with IL-6, Ml cells undergo 
rapid differentiation, resulting in the formation in agar of only single macrophages or small 
dispersed clusters of cells . Retrovirally-infected Ml cells that were unresponsive to IL-6 were 

15 selected in semi-solid agar culture by their ability to form large, tightly packed colonies in the 
presence of IL-6 and geneticin. A single stable IL-6-unresponsive clone, 4A2, was obtained 
after examining 10 4 infected cells. 

A fragment of the neomycin phosphotransferase (neo) gene was used to probe a Southern blot 
20 of genomic DNA from clone 4A2 and this revealed that the cell line was infected with a single 
retrovirus containing a cDNA approximately 1.4 kbp in length (Figure 2). PGR amplification 
using primers from the retroviral vector which flanked the cDNA cloning site enabled recovery 
of a 1.4 kbp cDNA insert, which we have named suppressor of cytokine signalling- 1, or 
SOCS1 . This PCR product was used to probe a similar Southern blot of 4A2 genomic DNA 
25 and hybridised to two fragments, one which corresponded to the endogenous SOCS 1 gene and 
the other, which matched the size of the band seen using the neo probe, corresponded to the 
SOCS1 cDNA cloned into the integrated retrovirus (Figure 2). The latter was not observed in 
an Ml cell clone infected with a retrovirus containing an irrelevant cDNA. Similarly, Northern 
blot analysis revealed that SOCS1 mRNA was abundant in the cell line 4A2, but not in the 
30 control infected Ml cell clone (Figure 2). 
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EXAMPLE 12 

SOCS1, SOCS2, SOCS3 AND CIS DEFINE A NEW FAMILY 
OF SH2-CONTAINING PROTEINS 

5 The SOCS1 PCR product was used as a probe to isolate homologous cDNAs from a mouse 
thymus cDNA library. The sequence of the cDNAs proved to be identical to the PCR product, 
suggesting that constitutive or over expression, rather than mutation, of the SOCS 1 protein was 
sufficient for generating an IL-6-unresponsive phenotype. Comparison of the sequence of 
SOCS 1 cDNA with nucleotide sequence databases revealed that it was present on mouse and 

10 rat genomic DNA clones containing the protamine gene cluster found on mouse chromosome 
16. Closer inspection revealed that the 1.4 kb SOCS 1 sequence was not homologous to any 
of the protamine genes, but rather represented a previously unidentified open reading frame 
located at the extreme 3 ' end of these clones (Figure 3). There were no regions of discontinuity 
between the sequences of the SOCS1 cDNA and genomic locus, suggesting that SOCS1 is 

1 5 encoded by a single exon. In addition to the genomic clone containing the protamine genes, a 
series of murine and human expressed sequenced tags (ESTs) also revealed large blocks of 
nucleotide sequence identity to mouse SOCS1. The sequence information provided by the 
human ESTs allowed the rapid cloning of cDNAs encoding human SOCS1. 

20 The mouse and rat SOCS 1 gene encodes a 212 amino acid protein whereas the human SOCS 1 
gene&ncodes a 21 1 amino acid protein. Mouse, rat and human SOCS 1 proteins share 95-99% 
amino acid identity (Figure 9). A search of translated nucleic acid databases with the predicted 
amino acid sequence of SOCS 1 showed that it was most related to a recently cloned cytokine- 
inducible immediate early gene product, CIS, and two classes of ESTs. Full length cDNAs 

25 from the two classes of ESTs were isolated and found to encode proteins of similar length and 
overall structure to SOCS1 and CIS. These clones were given the names SOCS2 and SOCS3. 
Each of the four proteins contains a central SH2 domain and a C-terminal region termed the 
SOCS motif. The SOCS1 proteins exhibit an extremely high level of amino acid sequence 
similarity (95-99% identity) amongst different species. However, the forms of the SOCS1, 

30 SOCS2, SOCS 3 and CIS from the same animal, while clearly defining a new family of SH2- 
containing proteins, exhibited a lower amino acid identity. SOCS2 and CIS exhibit 
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approximately 38% amino acid identity, while the remaining members of the family share 
approximately 25% amino acid identity (Figure 9). The coding region of the genes for SOCS 1 
and SOC3 appear to contain no introns while the coding region of the genes for SOCS2 and 
CIS contain one and two introns, respectively. 



The Genbank Accession Numbers for the sequences referred to herein are mouse SOCS 1 
cDNA (U88325), human SOCS1 cDNA (U88326), mouse SOCS2 cDNA (U88327), mouse 
SOCS3 cDNA (U88328). 



- To formally establish that the phenotype of the 4A2 cell line was directly related to expression 
of SOCS 1 , and not to unrelated genetic changes which may have occurred independendy in 
15 these cells, a cDNA encoding an epitope-tagged version of SOCS1 under the control of the 
EFla promoter was transfected into parental Ml cells, and Ml cells expressing the receptor 
for thrombopoietin, c-mpl (Ml.mpl). Transfection of the SOCS1 expression vector into both 
cell lines resulted in an increase in the frequency of IL-6 unresponsive Ml cells. 

20 Multiple independent clones of Ml cells expression SOCS1, as detected by Western blot, 
displayed a cytokine-uniesponsive phenotype that was indistinguishable from 4A2. Further, if 
transfectants were not maintained in puromycin, expression of SOCS1 was lost over time and 
cells regained their cytokine responsiveness. In the absence of cytokine, colonies derived from 
4A2 and other SOCS1 expressing clones characteristically grew to a smaller size than colones 

25 formed by control Ml cells (Figure 10). 

The effect of constitutive SOCS1 expression on the response of Ml cells to a range of 
cytokines was investigated using the 4A2 cell line and a clone of Ml.mpl cells expressing 
SOCS1 (MLmpLSOCSl). Unlike parental Ml cells and Mljnpl cells, the two cell lines 
30 expressing SOCS 1 continued to proliferate and foiled to form differentiated colonies in response 
to either IL-6, LH\ OSM, IFN-y or, in the case of the Ml .mpLSOCS 1 cell line, thrombopoietin 



5 
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CONSTITUTIVE EXPRESSION OF SOCS1 SUPPRESSES THE 
ACTION OF A RANGE OF CYTOKINES 
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(Figure 4). For both cell lines, however, a normal response to dexamethasone was observed, 
suggesting that SOCS 1 specifically affected cytokine signal transduction rather than 
differentiation per se. Consistent with these data, while parental Ml cells and Ml.mpl cells 
became large and vacuolated in response to IL-6, 4A2 and Ml.mpl.SOCSl cells showed no 
5 evidence of morphological differentiation in response to IL-6 or other cytokines (Figure 5). 

EXAMPLE 14 

SOCS1 INHIBITS A RANGE OF IL-6 SIGNAL TRANSDUCTION 
PROCESSES, INCLUDING STAT3 PHOSPHORYLATION 
10 AND ACTIVATION 

Phosphorylation of the cell surface receptor component gp 130, the cytoplasmic tyrosine kinase 
JAK1 and the transcription factor STAT3 is thought to play a central role in IL-6 signal 
transduction. These events were compared in the parental Ml and Ml.mpl cell lines and their 
SOCS 1 -expressing counterparts. As expected, gpl30 was phosphorylated rapidly in response 

15 to IL-6 in both parental lines, however, this was reduced five- to ten-fold in the cell lines 
expressing SOCS1 (Figure 6). Likewise, STAT3 phosphorylation was also reduced by 
approximately ten-fold in response to IL-6 in those cell lines expressing SOCS1 (Figure 6). 
Consistent with a reduction in STAT3 phosphorylation, activation of specific STAT DNA 
. binding complexes, as determined by electrophoretic mobility shift assay, was also reduced. 

20 Notably, there was a reduction in the formation of SIF-A (containing STAT3), SIF-B 
(STAT 1/STAT3 heterodimer) and SIF-C (containing ST ATI), the three STAT complexes 
induced in Ml cells stimulated with IL-6 (Figure 7). Similarly, constitutive expression of 
SOCS1 also inhibited IFN-y-stimulated formation of p91 homodimers (Figure 7). STAT 
phosphorylation and activation were not the only cytoplasmic processes to be effected by 

25 SOCS 1 expression, as the phosphorylation of other proteins, including she and MAP kinase, 
was reduced to a similar extent (Figure 7). 

EXAMPLE 15 

TRANSCRIPTION OF THE SOCS1 GENE IS STIMULATED BY IL-6 
30 IN VITRO AND IN VIVO 

Although SOCS1 can inhibit cytokine signal transduction when constitutively expressed in Ml 
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cells, this does not necessarily indicate that SOCS 1 normally functions to negatively regulate 
an IL-6 response. In order to investigate this possibility the inventors determined whether 
transcription of the SOCS1 gene is regulated in the response of Ml cells to EL-6 and, because 
of the critical role IL-6 plays in regulating the acute phase response to injury and infection, the 
5 response of the liver to intravenous injection of 5 mg IL-6. In the absence of IL-6, SOCS 1 
mRNA was undetectable in either Ml cells or in the liver. However, for both cell types, a 1.4 
kb SOCS1 transcript was induced within 20 to 40 minutes by IL-6 (Figure 8). For Ml cells, 
where the IL-6 was present throughout the experiment, the level of SOCS 1 mRNA remained 
elevated (Figure 8). In contrast, IL-6 was administered in vivo by a single intravenous injection 
10 and was rapidly cleared from the circulation, resulting in a pulse of IL-6 stimulation to the liver. 
Consistent with this, transient expression of SOCS 1 mRNA was detectable in the liver, peaking 
approximately 40 minutes after injection and declining to basal levels within 4 hours (Figure 8). 

EXAMPLE 16 

15 REGULATION OF SOCS GENES 



Since CIS was cloned as a cytokine-inducible immediate early gene the inventors examined 
whether SOCS1, SOCS2 and SOCS3 were similarly regulated. The basal pattern of expression 
of the four SOCS genes was examined by Northern blot analysis of mRNA from a variety of 

20 tissues from male and female C57B 1/6 mice (Figure 1 1 A). Constitutive expression of SOCS 1 
was observed in the thymus and to a lesser extend in the spleen and the lung. SOCS2 
expression was restricted primarily to the testis and in some animals the liver and lung; for 
SOCS3 a low level of expression was observed in the lung, spleen and thymus, while CIS 
expression was more widespread, including the testis, heart, lung, kidney and, in some animals, 

25 the liver. 

The inventors sought to determine whether expression of the four SOCS genes was regulated 
by IL-6. Northern blots of mRNA prepared from the livers of untreated and IL-6-injected 
mice, or from unstimulated and IL-6-stimulated Ml cells, were hybridised with labelled 
30 fragments of SOCS 1, SOCS2, SOCS3 and CIS cDNAs (Figure 1 IB). Expression of all four 
SOCS genes was increased in the liver following IL-6 injection, however the kinetics of 
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induction appeared to differ. Expression of SOCS1 and SOCS3 was transient in the liver, with 
mRNA detectable after 20 minutes of IL-6 injection and declining to basal levels within 4 hours 
for SOCS and 8 hours for SOCS3. Induction of SOCS2 and CIS mRNA in the liver followed 
similar initial kinetics to that of SOCS1, but was maintained at an elevated level for at least 24 
5 hours. A similar induction of SOCS gene mRNA was observed in other organs, notably the 
lung and the spleen. In contrast, in Ml cells, while SOCS1 and CIS mRNA were induced by 
IL^no induction of either SOCS2 or SOCS3 expression was detected. This result highlights 
cell type-specific differences in the expression of the genes of SOCS family members in 
response to the same cytokine. 



In order to examine the spectrum of cytokines that was capable of inducing transcription of the 
various members of the SOCS gene family, bone marrow cells were stimulated for an hour with 
a range of cytokines, after which mRNA was extracted and cDNA was synthesised. PCR was 
then used to assess the expression of SOCS 1, SOCS2, SOCS3 and CIS (Figure 1 1C). In the 

15 absence of stimulation, little or no expression of any of the SOCS genes was detectable in bone 
marrow by PCR. Stimulation of bone marrow cells with a broad array of cytokines appeared 
capable of up regulating mRNA for one or more members of the SOCS family. IFNy, for 
example, induced expression of all four SOCS genes, while erythropoietin, granulocyte colony- 
stimulating factor, granulocyte-macrophage colony stimulating factor and interleukin-3 induced 

20 expression of SOCS2, SOCS3 and CIS. Interestingly, tumor necrosis factor alpha, macrophage 
colony-stimulating factor and interleukin-1, which act through receptors that do not fall into 
the type I cytokine receptor class also appeared capable of inducing expression of SOCS3 and 
CIS, suggesting that SOCS proteins may play a broader role in regulating signal transduction. 

25 As constitutive expression of SOCS 1 inhibited the response of M 1 cells to a range of cytokines, 
the inventors examined whether phosphorylation of the cell surface receptor component gpl30 
and the transcription factor STAT3, which are though to play a central role in IL-6 signal 
transduction, were affected. These events were compared in the parental Ml and Ml.mpl cell 
lines and their SOCS 1 -expressing counterparts. As expected, gpl30 was phyosphorylated 

30 rapidly in response to IL-6 in both parental lines, however, this was reduced in the cell lines 
expressing SOCS1 (Figure 12A). Likewise, STAT3 phosphorylation was also reduced in 
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response to IL-6 in those cell lines expressing SOCS1 (Figure 12A). Consistent with a 
reduction in STAT3 phosphorylation, activation of specific STAT/DNA binding complexes, as 
determined by electrophoretic mobility shift assay, was also reduced. Notably, there was a 
Mure to form SIF-A (containing STAT3) and SDF-B(STAT1/STAT3 heterodimer), the major 
5 STAT coaxes induced in Ml cells stimulated with IL-6 (Figure 12B). Similarly, constitutive 

expressionofSOCSl also inhibited IFNy-stimulatingformaUonof SIF-C(STATI homodimer; 

Figure 12B). These experiments are consistent with the proposal that SOCS1 inhibits signal 

transduction upstream of receptor and STAT phosphorylation, potentially at the level of the 

JAK kinases. 

10 

The ability of SOCS1 to inhibit signal transduction and mtirnately the biological response to 
cytokines suggest that, like the SH2-containing phosphatase SHP-1 [Thteetal, 1994; Yie,^ 
1993], the SOCS proteins may play a central role in controlling the intensity and/or duration 
of a cell's response to a diverse range of extracellular stimuli by suppressing the signal 
15 transduction process. The evidence provided here indicates that the SOCS family acts in a 
classical negative feedback loop for cytokine signal transduction. Like other genes such as 
OSM, expression of genes encoding the SOCS proteins is induced by cytokines through the 
activation of STATs. Once expressed, it is proposed that the SOCS proteins inhibit the activity 
of JAKs and so reduce the phosphorylation of receptors and STATs, thereby suppressing signal 
20 transduction and any ensuing biological response. Importantly, inhibition of STAT activation 
wrt, over time, lead to a reduction in SOCS gene expression, ; allowing cells to regain 
responsiveness to cytokines. 



EXAMPLE 17 

25 DATABASE SEARCHES 

The NCBI genetic sequence database (Genbank). which encompasses the major database of 
expressed sequence tags (ESTs) and TIGR database of human expressed sequence tags were 
searched for sequences with similarity to a concensus SOCS box sequence using the TFASTA 
30 and MOTIF/PATTERN algorithms [Pearson, 199Q; Cockwell and Giles, 1989] Using the 
software package SRS fEtzold et al, 1996], ESTs that exhibited similarity to the SOCS box 
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(and their partners derived from sequencing the other end of cDNAs) were retrieved and 
assembled into contigs using Autoassembler (Applied Biosystems, Foster City, CA). Consensus 
nucleotide sequences derived from overlapping ESTs were then used to search the various 
databases using BLASTN [Altschul et al, 1990]. Again, positive ESTs were retrieved and 
5 added to the contig. This process was repeated until no additional ESTs could be recovered. 
Final consensus nucleotide sequences were then translated using Sequence Navigator (Applied 
Biosystems, Foster City, CA). 

The ESTs encoding the new SOCS proteins are as follows: human SOCS4 (EST81149, 

10 EST180909, EST182619, ya99H09, ye70co4, yh53c09, yh77gl 1, yh87h05, yi45h07, yj04e06, 
yql2h06, yq56a06, yq60e02, yq92g03, yq97h06, yr90f01, yt69c03, yv30a08, yv55f07, 
yv57h09, yv87h02, yv98ell, yw68dl0, yw82a03, yx08a07, yx72h06, yx76b09, yy37h08, 
yy66b02, za81f08, zbl8f07, zcO6e08, zdl4g06, zd51hl2, zd52b09, ze25gl 1, ze69f02, zf54f03, 
zh96e07, zv66hl2, zs83a08 and zs83g08). mouse SOCS-4 (mc65f04, mf42e06, mplOclO, 

15 mr81g09, and mtl9hl2). human SOCS-5 (EST15B103, EST15B105, EST27530 and 
zfSOfOl). mouse SOCS-5 (mc55a01,mh98f09, my26hl2 and ve24e06). human SOCS-6 
(yf61e08, yf93a09, yg05fl2, yg41fl)4, yg45c02, yhl lflO, yhl3b05, zc35a!2, ze02h08, zl09a03, 
zl69el0, zn39d08 and zo39e06). mouse SOCS-6 (mc04c05, md48a03, mf31d03, mh26b07, 
. mh78ell, mh88h09, mh94h07, mi27h04 and mj29c05, mp66g04, mw75g03, va53b05, 

20 vb34h02, Vc55d07, vc59e05, vc67d03, vc68dl0, vc97h01, vc99c08, vd07h03, vd08c01, 
vd09bl2, vdl9b02, vd29a04 and vd46d06). human SOCS-7 (STS WI30171, EST00939, 
EST12913, yc29b05, yp49fl0, ztl0f03 and zx73g04). mouse SOCS-7 (mj39a01 and 
vi52h07). mouse SOCS-8 (mj6e09 and vj27a029). human SOCS-9 (CSRL-82f2-u, 
EST1 14054, yy06b07, yy06g06, zr40c09, zr72h01, yx92c08, yx93b08 and hfe0662). mouse 

25 SOCS-9 (me65d05). human SOCS-10 (aa48hl0, zp35h01, zp97hl2, zq08h01, zr34g05, 
EST73000 and HSDHEI005). mouse SOCS-10 (mbl4dl2, mb40f06, mg89bll, mq89el2, 
mp03gl2 and vh53cll). human SOCS-11 (zt24h06 and zr43b02). human SOCS-13 
(EST59161). mouse SOCS-13 (ma39a09, me60c05, mi78g05, mklOcll, mo48gl2, mp94a01, 
vb57c07 and vh07cll). human SOCS-14 (mi75e03, vd29hll and vd53g07). 

30 
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EXAMPLE 18 
cDNA CLONING 

Based on the concensus sequences derived from overlapping ESTs, oligonucleotides were 
5 designed that were specific to various members of the SOCS family. As described above, 
oligonucleotides were labelled and used to screen commerically available genomic and cDNA 
libraries cloned with X bacteriophage. Genomic and/or cDNA clones covering the entire coding 
region of mouse SOCS4, mouse SOCS5 and mouse SOCS6 were isolated. The entire gene for 
SOCS15 is on the human 12pl3 BAC (Genbank Accession Number HSU47924) and the mouse 
10 chromosome 6 BAC (Genbank Accession Number AC002393). Partial cDNAs for mouse 
SOCS7, SOCS9, SOCS10, SOCS1 1, SOCS 12, SOCS13 and SOCS14 were also isolated. 

EXAMPLE 19 
NORTHERN BLOTS AND rtPCR 

15 

Northern blots were performed as described above. The sources of hybridisation probes were 
as follows; (i) the entire coding region of the mouse SOCS 1 cDNA, (ii) a 1059 bp PCR product 
derived from coding region of SOCS5 upstream of the SH2 domain, (iii) the entire coding 
. region of the mouse SOCS6 cDNA, (iv) a 790 bp PCR product derived from the coding region 
20 of a partial SOCS7 cDNA and (v) a 1200 bp Pst I fragment of the chicken glyceraldehyde 3- 
phosphate dehydrogenase (GAPDH) cDNA. 

EXAMPLE 20 
ADDITIONAL MEMBERS OF SOCS FAMILY 

25 

SOCS 1, SOCS2 and SOCS3 are members of the SOCS protein family identified in Examples 
1-16. Each contains a central SH2 domain and a conserved motif at the C-terminus, named the 
SOCS box. In order to isolate further members of this protein family* various DNA databases 
were searched with the amino acid sequence corresponding to conserved residues of the SOCS 
30 box. This search revealed the presence of human and mouse ESTs encoding twelve further 
members of the SOCS protein family (Figure 13). Using this sequence information cDNAs 
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encoding SOCS4, SOCS5, SOCS6, SOCS7, SOCS9, SOCS10, SOCS1 1, SOCS 12, SOCS13, 
SOCS14 and SOCS15 have been isolated. Further analysis of contigs derived from ESTs and 
cDNAs revealed that the SOCS proteins could be placed into three groups according to their 
predicted structure N-terminal of the SOCS box. The three groups are those with (i) SH2 
5 domains, (ii) WD-40 repeats and (iii) ankyrin repeats. 



10 

EXAMPLE 21 
SOCS PROTEIN WITH SH2 DOMAINS 

Eight SOCS proteins with SH2 domains have been identified. These include SOCS 1, SOCS2 
15 and SOCS3, SOCS5, SOCS 9, SOCS11 and SOCS14 (Figure 13). Full length cDNAs were 
isolated for mouse SOCS5 and SOCS 14 and partial clones encoding mouse SOCS9 and 
SOCS 14. Analysis of primary amino acid sequence and genomic structure suggest that pairs 
of these proteins (SOCS1 and SOCS3, SOCS2 and CIS, SOCS5 and SOCS 14 and SOCS9 and 
. SOCS11) are most closely related (Figure 13). Indeed, the SH2 domains of SOCS5 and 
20 SOCS 14 are almost identical (Figure 13B), and unlike CIS, SOCS1, SOCS2 and SOCS3, 
SOCS5 and SOCS14 have an extensive, though less well conserved, N-terminal region 
preceding their SH2 domains (Figure 13A). 

EXAMPLE 22 

25 SOCS PROTEINS WITH WD-40 REPEATS 

Four SOCS proteins with WD-40 repeats were identified. As with the SOCS proteins with 
SH2 domains, pairs of these proteins appeared to be closely related. Full length cDNAs of 
mouse SOCS4 and SOCS6 were isolated and shown to encode proteins containing eight WD- 
30 40 repeats N-terminal of the SOCS box (Figure 13) and SOCS4 and SOCS6 share 65% amino 
acid similarity. SOCS 15 was recognised as an open reading frame upon sequencing BACs from 
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- human chromosome 12pl3 and the syntonic region of mouse chromosome 6 [Ansari-Lari et al t 
1997]. In the human* chimp and mouse, SOCS 15 is encoded by a gene with two coding exons 
that lies within a few hundred base pairs of the 3' end of the triose phosphate isomerase (TPI) 
gene, but which is encoded on the opposite strand to TPI (9). In addition to a C-terminal 
5 SOCS box, the SOCS 15 protein contains four WD-40 repeats. Interestingly, within the EST 
databases, there is a sequence of a nematode, an insect and a fish relative of SOCS 1 5. SOCS 1 5 
appears most closely related to SOCS 13. 



EXAMPLE 23 

10 SOCS PROTEINS WITH ANKYRIN REPEATS 

Three SOCS proteins with ankyrin repeats were identified. Analysis of partial cDNAs of mouse 
SOCS7, SOCS 10 and SOCS 12 demonstrated the presence of multiple ankyrin repeats. 

15 EXAMPLE 24 

EXPRESSION PATTERN OF SOCS PROTEINS 



The expression of mRNA from representative members of each class of SOCS proteins - 
. SOCS1 and SOCS5 from the SH2 domain group, SOCS6 from the WD-40 repeat group and 
20 SOCS7 from the ankyrin repeat group was examined. As shown above, SOCS1 mRNA is 
found in abundance in the thymus and at lower levels in other adult tissues. 

Since transcription of the SOCS1 gene is induced by cytokines, the inventors sought to 
determine whether levels of SOCS5, SOCS6 and SOCS7 mRNA increased upon cytokine 
25 stimulation. In the livers of mice injected with IL-6, SOCS1 mRNA is detectable after 20 min 
and decreases to background levels within 2 hours. In contrast, the kinetics of SOCS5 mRNA 
expression are quite different, being only detectable 12 to 24 hours after IL-6 injection. SOCS6 
mRNA appears to be expressed constitutively while SOCS7 mRNA was not detected in the 
liver either before injection of IL-6 or at any time after injection. 

30 

Expression of these genes was also examined after cytokine stimulation of the factor-dependent 
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cell line FDCP-1 engineered to express bcl-w. Again, while SOCS6 mRNA was expressed 
constitutively. 



Mouse and human SOCS4 were recognized through searching EST databases using the SOCS 
box consensus (Figure 13). Those ESTs derived from mouse and human SOCS4 cDNAs are 
tabulated below (Tables 4.1 and 4.2). Using sequence information derived from mouse ESTs 

10 several oligonucleotides were designed and used to screen, in the conventional manner, a mouse 
thymus cDNA library cloned into A.— bacteriophage. Two cDNAs encoding mouse SOCS4 
were isolated and sequenced in their entirety (Figure 15) and shown to overlap the mouse 
ESTs identified in the database (Table 4. 1 and Figure 17). These cDNAs include a region of 
5 ' untranslated region, the entire mouse SOCS4 coding region and a region of 3' untranslated 

15 region (Figure 17). Analysis of the sequence confirms that the SOCS4 cDNA encodes a 
SOCS Box at its C-terminus and a series of 8 WD-40 repeats before the SOCS Box (Figures 
17 and 16). The relationship of the two sequence contigs of human SOCS4 (h4.1 and h4.2) 
to the experimentally determined mouse SOCS4 cDNA sequence is shown in Figure 17. The 
nucleotide sequence of the two human contigs is listed in Figure 18. 

20 

SEQ JD NO: 13 and 14 represent the nucleotide sequence of murine SOCS4 and the 
corresponding amino acid sequence. SEQ ID NOs: 15 and 16 are SOCS4 cDNA human 
contigs h4. 1 and h4.2, respectively. 

25 EXAMPLE 26 



Mouse and human SOCS5 were recognized through searching EST databases using the SOCS 
box consensus (Figure 13). Those ESTs derived from mouse and human SOCS5 cDNAs are 
30 tabulated below (Tables 5.1 and 5.2). Using sequence information derived from mouse and 
human ESTs, several oligonucleotides were designed and used to screen, in the conventional 
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manner, a mouse thymus cDNA library, a mouse genomic DNA library and a human thymus 
cDNA library cloned into X-bacteriophage . A single genomic DNA clone (57-2) and (5-3-2) 
cDNA clone encoding mouse SOCS5 were isolated and sequenced in their entirety and shown 
to overlap with the mouse ESTs identified in the database (Figures 19 and 20A). The entire 
5 coding region, in addition to a region of 5 # and 3' untranslated regions of mouse SOCS5 
appears to be encoded on a single exon (Figure 19). Analysis of the sequence (Figure 20) 
confirms that SOCS5 genomic and cDN A clones encode a protein with a SOCS box at its C- 
terminus in addition to an SH2 domain (Figure 19 and 20B). The relationship of the human 
SOCS5 contig (h5.1; Figure 21) derived from analysis of cDNA clone 5-94-2 and the human 
10 SOCS5 ESTs (Table 5.2) to the mouse SOCS5 DNA sequence is shown in Figure 19. The 
nucleotide sequence and corresponding amino acid sequence of murine SOCS5 are shown in 
SEQ ID NOs: 17 and 18, respectively. The human SOCS5 nucleotide sequence is shown in 
SEQ ID NO: 19. 

15 EXAMPLE 27 

SOCS6 

Mouse and human SOCS6 were recognized through searching EST databases using the SOCS 
box consensus (Figure 13). Those ESTs derived from mouse and human SOCS6 cDNAs are 

20 tabulated below (Tables 6.1 and 6.2). Using sequence information derived from mouse ESTs, 
several oligonucleotides were designed and use to screen, in the conventional manner, a mouse 
thymus cDNA library. Eight cDNA clones (6-1 A, 6-2A, 6-5B, 6-4N, 6-18, 6-29, 6-3N, 6-5N) 
cDNA clone encoding mouse SOCS6 were isolated and sequenced in their entirety and shown 
to overlap with the mouse ESTs identified in the database (Figures 22 and 23A). Analysis of 

25 the sequence (Figure 23) confirms that the mouse SOCS6 cDNA clones encode a protein with 
a SOCS box at its C-tenninus in addition to a eight WD-40 repeats (Figures 22 and 23B). The 
relationship of the human SOCS-6 contigs (h6. 1 and h6.2 ; Figure 24) derived from analysis of 
human SOCS6 ESTs (Table 6.2) to the mouse SOCS6 DNA sequence is shown in Figure 22. 
The nucleotide and corresponding amino acid sequences of murine SOCS6 are shown in SEQ 

30 ID NOs: 20 and 21, respectively. SOCS6 human contigs h6.1 and h6.2 are shown in SEQ ID 
NOs: 22 and 23, respectively. 
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EXAMPLE 28 



SOCS7 



Mouse and human SOCS7 were recognized through searching EST databases using the SOCS 
5 box consensus (Figure 13). Those ESTs derived from mouse and human SOCS-7 cDNAs are 
tabulated below (Tables 7.1 and 7.2). Using sequence information derived from mouse ESTs, 
several oligonucleotides were designed and use to screen, in the conventional manner, a mouse 
thymus cDNA library. One cDNA clone (74-10A-1 1) cDNA clone encoding mouse SOCS7 
was isolated and sequenced in its entirety and shown to overlap with the mouse ESTs identified 

10 in the database (Figures 25 and 26A). Analysis of the sequence (Figure 26) suggests that 
mouse SOCS7 encodes a protein with a SOCS box at its C-tenninus, in addition to several 
ankyrin repeats (Figure 25 and 26B). The relationship of the human SOCS7 contigs (h7. 1 and 
h7.2 ; Figure 27) derived from analysis of human SOCS7 ESTs (Table 7.2) to the mouse 
SOCS7 DNA sequence is shown in Figure 25. The nucleotide and corresponding amino acid 

15 sequences of murine SOCS7 are shown in SEQ ID NOs: 24 and 25, respectively. The 
nucleotide sequence of SOCS7 human contigs h7.1 and h7.2 are shown in SEQ ID NOs: 26 and 
27, respectively. 



ESTs derived from mouse SOCS8 cDNAs are tabulated below (Table 8.1). As described for 
other members of the SOCS family, it is possible to isolate cDNAs for mouse SOCS8 using 



the ESTs shown in Figure 29A and the partial amino acid sequence of SOCS8 shown in Figure 
29B. The nucleotide sequence and corresponding amino acid sequences for murine SOCS8 are 
shown in SEQ ID NOs:28 and 29, respectively. 
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SOCS8 



25 



sequence information derived from mouse ESTs. The relationship of the ESTs to the predicted 
coding region of SOCS8 is shown in Figure 28. With the nucleotide sequence obtained from 
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Mouse and human SOCS-9 were recognized through searching EST databases using the SOCS 
box consensus (Figure 13). Those ESTs derived from mouse and human SOCS9 cDNAs are 
tabulated below (Tables 9.1 and 9.2). The relationship of the mouse SOCS9 contigs (ni9.1; 
Figure 9.2) derived from analysis of the mouse SOCS9 EST (Table 9.1) to the human SOCS-9 
5 DNA contig (h9.1; Figure 32) derived from analysis of human SOCS9 ESTs (Table 9.2) is 
shown in Figure 31 . Analysis of the sequence (Figure 32) indicates that the human SOCS9 
cDNA encodes a protein with a SOCS box at its C-terminus, in addition to an SH2 domain 
(Figure 30). The nucleotide sequence of muring SOCS9 cDNA is shown in SEQ ID NO:30. 
The nucleotide sequence of human SOCS9 cDNA is shown in SEQ ID NO:31. 



Mouse and human SOCS 10 were recognized through searching EST databases using the SOCS 
15 box consensus (Figure 13). Those ESTs derived from mouse and human SOCS 10 cDNAs are 
tabulated below (Table 10. 1 and 10.2). Using sequence information derived from mouse ESTs, 
several oligonucleotides were designed and use to screen, in the conventional manner, a mouse 
thymus cDNA library. Four cDNA clones (10-9, 10-12, 10-23 and 10-24) encoding mouse 
„ SOCS 10 were isolated, sequenced in their entirety and shown to overlap with the mouse and 
20 human ESTs identified in the database (Figures 33 and 34). Analysis of the sequence (Figure 
34) indicates that the mouse SOCS 10 cDNA clone is not full length but that it does encode a 
protein with a SOCS box at its C-terminus, in addition to several ankyrin repeats (Figure 33). 
The relationship of the human SOCS 10 contigs (hi 0.1 and hi 0.2 ; Figure 35) derived from 
analysis of human SOCS10 ESTs (Table 10.2) to the mouse SOCS10 DNA sequence is shown 
25 in Figure 33. Comparison of mouse cDNA clones and ESTs with human ESTs suggests that 
the 3* untranslated regions of mouse and human SOCS 10 differ significantly. The nucleotide 
sequence of murine SOCS10 is shown in SEQ ID NO:32 and the nucleotide sequence of 
SOCS10 human contigs hlO.l and hl0.2 are shown in SEQ ID NOs:33 and 34, respectively. 



10 



EXAMPLE 31 



SOCS10 



30 
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Human SOCS11 were recognized through searching EST databases using the SOCS box 
consensus (Figure 13). Those ESTs derived from human SOCS 1 1 cDNAs are tabulated below 
(Table 11.1 and 1 1.2). The relationship of the human SOCS 1 1 contigs (hi 1.1; Figure 36A, B), 
derived from analysis ESTs (Table 1 1.2) to the predicted encoded protein, is shown in Figure 
5 37. Analysis of the sequence indicates that the human SOCS 1 1 cDNA encodes a protein with 
a SOCS box at its C-terminus, in addition to an SH2 domain (Figure 37 and 36B). The 
nucleotide sequence and corresponding amino acid sequence of human SOCS 1 1 are represented 
in SEQ ID NOs:35 and 36, respectively. 

10 EXAMPLE 33 



Mouse and human SOCS-12 were recognized through searching EST databases using the 
SOCS box consensus (Figure 13). Those ESTs derived from mouse and human SOCS12 

15 cDNAs are tabulated below (Tables 12. 1 and 12.2). Using sequence information derived from 
mouse ESTs, several oligonucleotides were designed and use to screen, in the conventional 
manner, a mouse thymus cDNA library. Four cDNA clones (10-9, 10-12, 10-23 and 10-24) 
encoding mouse SOCS 12 were isolated, sequenced in their entirety and shown to overlap with 
the mouse and human ESTs identified in the database (Figures 38 and 39). Analysis of the 

20 sequence (Figure 39 and 40) indicates that the SOCS 12 cDNA clone encodes a protein with 
a SOCS box at its C-terminus, in addition to several ankyrin repeats (Figure 38). The 
relationship of the human SOCS 12 contigs (hl2.1 and hl2.2 ; Figure 40) derived from analysis 
of human SOCS 12 ESTs (Table 12.2) to the mouse SOCS 12 DNA sequence is shown in Figure 
38. Comparison of mouse cDNA clones and ESTs with human ESTs suggests that the 3' 

25 untranslated regions of mouse and human SOCS 12 differ significantly. The nucleotide 
sequence of SOCS 12 is shown in SEQ ID NO:37. The nucleotide sequence of human SOCS 12 
contigs hl2.1 and h!2.2 are shown in SEQ ID NOs:38 and 39, respectively. 



SOCS12 
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Mouse and human SOCS-13 were recognized through searching EST databases using the 
SOCS box consensus (Figure 13). Those ESTs derived from mouse and human SOCS13 
cDNAs are tabulated below (Tables 13.1 and 13.2). Using sequence information derived from 
mouse ESTs, several oligonucleotides were designed and use to screen, in the conventional 
5 manner, a mouse thymus and a mouse embryo cDNA library. Three cDNA clones (62-1 , 62-6- 
7 and 62-14) encoding mouse SOCS 13 were isolated, sequenced in their entirety and shown 
to overlap with the mouse ESTs identified in the database (Figure 41 and 42A). Analysis of the 
sequence (Figure 42) indicates that the mouse SOCS 1 3 cDNA encodes a protein with a SOCS 
box at its C-terminus, in addition to a potential WD-40 repeat (Figure 41 and 42B). The 
10 relationship of the human SOCS13 contigs (hl3. 1 and hl3.2 ; Figure 43) derived from analysis 
of human SOCS 13 ESTs (Table 13.2) to the mouse SOCS 13 DNA sequence is shown in Figure 
41. The nucleotide sequence and corresponding amino acid sequence of murine SOCS 13 and 
shown in SEQ ID NOs:40 and 41 , respectively. The nucleotide sequence of human SOCS 13 
contig hl3.1 is shown in SEQ ID NO:42. 

15 

EXAMPLE 35 
SOCS14 

. Mouse and human SOCS-14 were recognized through searching EST databases using the 
20 SOCS box consensus (Figure 13). Those ESTs derived from mouse and human SOCS 14 
cDNAs are tabulated below (Tables 14.1 and 14.2). Using sequence information derived from 
mouse and human ESTs, several oligonucleotides were designed and use to screen, in the 
conventional manner, a mouse thymus cDNA library, a mouse genomic DNA library and a 
human thymus cDNA library cloned into ^-bacteriophage . A single genomic DNA clone (57- 
25 2) and (5-3-2) cDNA clone encoding mouse SOCS 14 were isolated and sequenced in their 
entirety and shown to overlap with the mouse ESTs identified in the database (Figures 44 and 
45 A). The entire coding region, in addition to a region of 5' and 3* untranslated regions, of 
mouse SOCS 14 appears to be encoded on a single exon (Figure 44). Analysis of the sequence 
(Figure 45) confirms that SOCS 14 genomic and cDNA clones encode a protein with a SOCS 
30 box at its C-terminus in addition to an SH2 domain (Figure 44 and 45B). The relationship of 
the human SOCS 14 contig (hl4.1; Figure 14.3) derived from analysis of cDNA clone 5-94-2 
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and the human SOCS14 ESTs (Table 14.2) to the mouse SOCS14 DNA sequence is shown 
in Figure 44. 

The nucleotide sequence and corresponding amino acid sequence of murine SOCS14 are 
5 shown in SEQ ID NOs: 43 and 44, respectively. 
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EXAMPLE 36 



SOCS15 



Mouse and human SOCS15 were recognized through searching DNA databases using the 
5 SOCS box consensus (Figure 13). Those ESTs derived from mouse and human SOCS15 
cDNAs are tabulated below (Tables 15.1 and 15.2), as are a mouse and human BAC that 
contain the entire mouse and human SOCS- 15 genes. Using sequence information derived from 
the ESTs and the BACs it is possible to predict the entire amino acid sequence of SOCS 15 and 
as described for the other SOCS genes it is feasible to design specific oligonucleotide probes 

10 to allow cDNAs to be isolated. The relationship of the BACs to the ESTs is shown in Figure 
46 and the nucleotide and predicted amino acid sequence of the SOCS- 15, derived from the 
mouse and human BACs is shown in Figures 47 and 48. The nucleotide sequence and 
corresponding amino acid sequence of murine SOCS 15 are shown in SEQ ID NOs:46 and 47, 
respectively. The nucleotide and corresponding amino acid sequence of human SOCS 15 are 

15 shown in SEQ ID NO:48 and 49, respectively. 



20 These Examples show interaction between SOCS and JAK2 kinase. Interaction is mediated via 
the SH2 domain of SOCS1, 2, 3 and CIS. The interaction resulted in inhibition of JAK2 kinase 
activity by SOCS 1 (Figure 49). General interaction between JAK2 and SOCS 1,2, 3, and CIS 
is shown in Figure 50. 

25 The following methods are employed: 

Immunoprecipitation : Cos 6 cells were transiently transacted by electroporation and cultured 
for 48 hours. Cells were then lysed on ice in lysis buffer (50 rnM Tris/HCL, pH 7.5, 150 mM 
NaCl, 1 % v/v Triton-X-100, 1 mM EDTA, 1 mM Naf, 1 mM Na 3 V0 4 ) with the addition of 
30 complete protease inhibitors (Boehringer Mannheim), centrifuged at 4°C ( 14,000 x g, 10 min) 
and the supernatant retained for immunoprecipitation. JAK2 proteins were immunoprecipitated 
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using 5 Atl anti-JAK2 antibody (UBI). Antigen-antibody complexes were recovered using 
protein A-Sepharose (30 (A of a 50% slurry). 



Western blotting: Immunoprecipitates were analysed by sodium dodecyl sulphate (SDS) - 
5 polyacrylamide gel electrophoresis (PAGE) under reducing conditions. Protein was then 
electrophoretically transferred to nitrocellulose, blocked overnight in 10% w/v skim-milk and 
washed in PBS/0.1% v/v Tween-20 (Sigma) (wash buffer) prior to incubation with either anti- 
phosphotyrosine antibody (4G10) (1:5000, UBI), anti-FLAG antibody (1.6 /zg/ml) or anti-JAK2 
antibody (1:2000, UBI) diluted in wash buffer/1% w/v BSA for 2 hr. Nitrocellulose blots were 
10 washed and primary antibody detected with either peroxidase-conjugated sheep anti-rabbit 
immunoglobulin ( 1 :5000, Silenus) or peroxidase-conjugated sheep anti-mouse immunoglobulin 
(1:5000, Silenus) diluted in wash buffer/1% w/v BSA. Blots were washed and antibody binding 
visualised using the enhanced chemiluminescence (ECL) system (Amersham, UK) according 
to the manufacturers' instructions. 

15 

In-vitro kinase assay: An in vitro kinase assy was performed to assess intrinsic JAK2 kinase 
catalytic activity. JAK2 protein were immunopreciptated as described, washed twice in kinase 
assay buffer (50 mM NaCl, 5 mM MgCl 2 , 5 mM MnC12, 1 mM NaF, 1 mM Na 3 V0 4 , 10 rnM 
HEPES, pH 7.4) and suspended in an equal volume of kinase buffer containing 0.25 /zCi/ml (y- 
20 32 P)-ATP (30 min, room temperature). Exce& (y- P)-ATP was removed and the 
immunoprecipitates analysed by SDS/PAGE under reducing conditions. Gels were subjected 
to a mild alkaline hydrolysis by treatment with 1 M KOH (55°C, 2 hours) to remove 
phosphoserine and phosphothreonine. Radioactive bands were visualised with IMAGEQU ANT 
software on a Phosphorlmage system (Molecular Dynamics, Sunnyvale, CA, USA). 

25 

EXAMPLE 38 
MAKING SOCS-1 KNOCKOUT CONSTRUCTS 

Diagrams of plasmid constructs and knockout constructs are shown in Figures 51-53. The 
30 genomic SOCS-1 clone 95-1 1-10 was digested with the restriction enzymes BamHl and EcoRl 
to obtain a 3.6Kb DNA fragment 3* of the coding region (SOCS-1 exon), which was used as 
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the y arm in the SOCS-1 knockout vectors. The ends of this fragment were then blunted. This 
fragment was then ligated into the following vectors: 
pBgalpAloxNeo 
and pBgalpAloxNeoTK 
5 which had been linearized at the unique Xhol site and then blunted. This ligation resulted in the 
formation of the following vectors: 

3'SOCS-l arm in pBgalpAloxNeo 
and 3'SOCS-l arm in pBgalpAloxNeoTK 

10 The 5* arm of the SOCS-1 knockout vectors was constructed by using PCR to generate a 2.5Kb 
PCR product from the genomic SOCS-1 clone 95-1 1-10 just 5* of the SOCS-1 coding region 
(SOCS-1 exon). The oligo's used to generate this product were: 
5' oligo (sense) (2465) 

AGCT AGA TCT GGA CCC TAC AAT GGC AGC [SEQ ID NO:49] 

15 

3* oligo (antisense) (2466) 

AGCT AG ATC TGC CAT CCT ACT CGA GGG GCC AGC TGG [SEQ ID NO:50] 

The PCR product was then digested with the restriction enzyme Bgin t to generate BgUI ends 
20 to the PCR product. This 5* SOCS-1 PCR product,with BgftT, ends was then ligated as follows: 

3'SOCS-l arm in pBgalpAloxNeo and 3'SOCS-l arm in pBgalpAloxNeoTK, which had been 

linearized with the unique restriction enzyme BamHL This resulted in the following vectors 

being formed: 

5'&3'SOCS-l arms in pBgalpAloxNeo 
25 and SV^SOCS-l arms in pBgalpAloxNeoTK 

These were the final SOCS-1 knockout constructs. Both these constructs lacked the entire 
SOCS-1 coding region (SOCS-1 EXON), being replaced with portions of the Bgal, B globin 
polyA, PGK promoter , neomycin and PGK poly A sequences. The S^'SOCS-l arms in 
30 pBgalpAloxNeoTK vector also contained the tymidine kinase gene sequence, between the 
neomycin and PGK poly A sequences. 
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The vectors: 5'&3'SOCS- 1 arms in pBgalpAloxNeo 

and S'&S'SOCS-l arms in pBgalpAloxNeoTK 
were linearized with the unique restriction enzyme Notl and then transfected into Embryonic 
5 stem cells by electroporation. Clones which were resistant to neomycin were selected and 
analysed by southern blot to determine if they contained the correctly integrated SOCS-1 
targeting sequence. In order to determine if correct integration had occurred, genomic DNA 
from the neomycin resistant clones was digested with the restriction enzyme EcoRl. The 
digested DNA was then blotted onto nylon filters and probed with a 1.5Kb EcoRl /Hind HI 
10 DNA fragment, which was further 5' of the 5'arm sequence used in the knockout constructs. 
The band sizes expected for correct integration were: 

Wild type SOCS-1 allele 5.4Kb 

15 SOCS-1 knockout allele 8.2Kb in 5*&3'SOCS-l arms in pBgalpAloxNeo 
or 1 1Kb in 5*&3'SOCS-l arms in pBgalpAloxNeoTK transfomed cells. 

Those skilled in the art will appreciate that the invention described herein is susceptible to 
variations and modifications other than those specifically described. It is to be understood that 
20 the invention includes all such variations and modifications. The invention also includes all of 
the steps, features, compositions and compounds referred to or indicated in this specification, 
individually or collectively, and any and all combinations of any two or more of said steps or 



features. 
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Table4.1 

Summary of ESTs derived from mouse SOCS-4 cDNAs 

SOCS Species EST name End EST no 

SOCS-4 Mouse mc65fD4 5' EST0549700 



mf42e06 



EST0593477 



mplOclO 5' EST0747905 

mr81g09 5 f EST0783081 

mtl9hl2 5* EST08 16531 



Table 4.2 

Summary of ESTs derived from human SOCS-4 cDNAs 



SOCS 
SOCS-4 



Species 
Human 



EST name 

27b5 

30d2 

J0159F 

J3802F 

EST19523 

EST81149 

EST1 80909 

EST182619 

ya99h09 
ye70c04 
yh53c09 



End 

5' 
5' 
5' 
5' 

y 

5' 
5* 

5* 

3' 

5* 

5* 
3* 



EST no 

EST0534081 

EST0534315 

EST0461188 

EST0461428 

EST0958884 

EST1011015 

EST0951375 

EST0953220 

EST0103262 

EST0172673 

ESTO 197390 
ESTO 197391 



Library source 

d 13.5- 14.5 mouse 
embryo 

d!3.5-14.5 mouse 
embryo 



d!3 embryo 
spleen 



Library source 



retina 



retina 



foetal heart 

foetal heart 

retina 

placenta 

JurkatT- 
lymphocyte 

JurkatT- 
lymphocyte 

placenta 

foeatl liver/spleen 
placenta 



Contig 
m4.1 

m4.1 



d 8.5 mouse embryo m4. 1 



m4.l 
m4.1 



Contig 

M.2 

h4.2 

h4.2 

b4.2 

H4.2 

h4.2 

h4.2 

h4.1 

h4.2 

h4.2 

h4.2 
h4.2 
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yh77gll 

yh87h05 

yi45h07 
yj04e06 

yql2h06 
yq56a06 
yq60e02 

yq92g03 

yq97h06 

yr90f01 
yt69c03 

yv30a08 
yv55f07 

yv57h09 

yv87h02 
yv98el 1 

yw68dl0 
yw82a03 



5' 

y 

5' 
3' 



5 f 
3* 

5' 

y 

5' 

y 

5' 
3' 

5* 
3' 

5* 

5' 

y 
y 

5' 
3' 

5 f 
3' 

5' 

5' 
3' 

5* 

5* 



EST0203418 
EST0203419 

EST0204888 
EST0204773 

EST0246604 

EST0258541 
EST0258285 

EST0309968 

EST0346924 

EST0347259 
EST0347209 

EST0355932 
EST0355884 

EST0357618 
EST0357416 

EST0372402 

ESTO338395 
EST0338303 

EST0458506 

EST0465391 
EST0463331 

EST0464336 
EST0458765 

EST0388085 

EST0400679 
EST0400680 

EST0441370 

EST0463005 



placenta 

placenta 

placenta 
placenta 

foetal liver spleen 
foetal liver spleen 
foetal liver spleen 

foetal liver spleen 

foetal liver spleen 

foetal liver spleen 
foetal liver spleen 

foetal liver spleen 
foetal liver spleen 

foetal liver spleen 

melanocyte 
melanocyte 

placenta (8-9 wk) 
placenta (8-9 wk) 



h4.2 
h4.1 

h4.1 
h4.1 

h4.2 

h4.1 
h4.1 

h4.2 

h4.2 

h4.2 
h4.2 

h4.2 
h4.2 

h4.2 
h4.2 

h4.2 

h4.2 
h4.2 

h4.2 

h4.2 
h4.2 

h4.2 
h4.2 

h4.2 

h4.2 
h4.2 

h4.2 

h4.2 
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yx08a07 
yx72h06 

yx76b09 
yy37h08 
yy66b02 

za81f08 
zbI8f07 
zc06e08 

zdl4g06 
zd51hl2 
zd52b09 

ze25gll 
ze69f02 

zf54fD3 
zh96c07 

zv66h!2 
zs83a08 

2s83g08 



y 
y 

5* 

y 

5* 
5' 
5 f 

5* 

y 

5* 

y 
y 
y 

5' 

y 



5' 
3' 

5' 

5' 
3* 

5* 

5* 

3' 

5* 

y 



EST0433678 

EST0407016 

EST0435158 
EST0422871 

EST0434011 

EST0451704 

EST0505446 

EST0511777 

EST0485315 

EST0540473 
ESTO540354 

EST0564666 

EST0578099 

EST0582012 
EST0581958 

EST0679543 

EST0635563 
EST0635472 

EST0680111 

ESTO616241 
EST0615745 

EST1043265 

EST0920072 

ESTO920016 

EST0920121 

EST0920122 



melanoocyte 

melanoocyte 
melanoocyte 

melanoocyte 

melanoocyte 



multiple sclerosis 
lesion 



foetal lung 
foetal lung 
parathyroid tumor 

foetal heart 
foetal heart 
foetal heart 

foetal heart 
retina 

retina 

foetal liver spleen 

8-9w foetus 

germinal centre B 
cell 



germinal centre B 
cell 



h4.1 

h4.1 

h4.2 
h4.1 

h4.2 

h4.2 

h4.2 

h4.2 

h4.1 

h4J 
h4.1 

h4.1 

h4.1 

h4.1 
h4.1 

h4.1 

Ma 
MA 

h4.2 

h4.2 
h4.2 

h4.2 

h4.1 

h4.1 

h4.1 

h4.1 
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Table 5.1 

Summary of ESTs derived from mouse SOCS-5 cDNAs 



SOCS 
SOCS-5 Mouse 



EST name 
mc55a01 

mh98f09 
my26hl2 
ve24e06 



End 
5' 

5' 
5* 
5' 



EST no 
EST0541556 

EST0638237 
EST0859939 
EST08 19106 



Table 5.2 

Summary of ESTs derived from human SOCS-5 cDNAs 



SOCS Species EST name 
SOCS-5 Human EST15B103 
EST15B105 
EST27530 
zfSOfOl 



End EST no 

? EST0258029 

7 EST0258028 

5* EST0965892 

5' EST0679820 



Library source Contig 

dl 3.5- 14.5 mouse m5.1 
embryo 



placenta 
mixed organs 
heart 



Library source 
adipose tissue 
adipose tissue 
cerebellum 
retina 



m5.1 
m5.1 
m5.1 



Contig 

h5.1 

h5.1 

h5.1 

h5.1 



Table 6.1 

Summary of ESTs derived from mouse SOCS-6 cDNAs 

SOCS Species EST name End EST no Library source 

SOCS-6 Mouse mco4c05 5' EST0525832 dl9.5 embryo 

md48a03 5* ESTO566730 dl3.5-14.5 embryo 

mf31d03 5* EST0675970 dl 3.5-14.5 embryo 

mh26b07 5' EST0628752 dl 3.5-14.5 placenta 

mh78ell 5' EST0637608 dl 3.5- 1 4.5 placenta 

mh88h09 5' EST0644383 d 13.5- 14.5 placenta 



Contig 

m6.1 

m6.1 

m6.1 

m6.1 

m6.1 

m6.1 
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mh94h07 

mi27h04 

mj29c05 

mp66g04 

mw75g03 

va53b05 

vb34h02 

vc55d07 

vc59e05 

vc67d03 

vc68dl0 

vc97h01 

vc99c08 

vd07h03 

vdOScOl 

vd09bl2 

vdl9b02 

vd29a04 

vd46d06 



y 
y 

5* 

y 
y 
y 
y 
y 
y 
y 
y 
y 
y 
y 
y 
y 
y 
y 
y 



EST0638078 
EST0644252 
EST0664093 
ESTO757905 
EST0847938 
EST09O1540 
ESTO930132 
EST1 057735 
EST1 058201 
EST1 057849 
EST1 058663 
EST1 059343 
EST1059410 
EST1 058173 
EST1 058275 
EST1058632 
EST1 059723 
? none found 
? none found 



dl3.5-I4.5 placenta 
d!3.5-14.5 embryo 
dl3.5-14.5 embryo 
thymus 
liver 

dI2^5 embryo 
lymph node 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 



m6.1 

m6.1 

m6.1 

m6.1 

m6.1 

m6.1 

m6.1 

m6.1 

m6.1 

m6.1 

m6.1 

m6.1 

m6.1 

m6.1 

m6.1 

m6.1 

m6.1 

m6.I 

m6.1 
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Table 6.2 

Summary of ESTs derived from human SOCS-5 cDNAs 



SOCS Species EST name End 

SOCS- 6 Human 

yf61e08 5' 

yf93a09 5' 

yg05fl2 5' 

yg41f04 5' 

yg45c02 5 f 

yhllflO 5* 

yhl3b05 5' 
3 » 

zc35al2 5' 



ze02h08 

zl09a03 

zl69el0 
zn39d08 
zo39e06 



5' 
3 ' 

5* 
3 ' 

5' 

5* 

5 1 



EST no 

EST0184387 

EST0186084 

EST0191486 

EST0195017 

EST0185308 

EST0236705 

EST0237191 
EST0236958 

EST0555518 

EST0603826 
EST0603718 

EST077393 6 
EST0773892 

EST0683363 

EST0718885 

EST0785947 



Library source 

d73 infant brain 
d73 infant brain 
d73 infant brain 
d73 infant brain 
d73 infant brain 
d73 infant brain 
d73 infant brain 



senescent 
fibroblasts 

foetal heart 



pregnant uterus 
colon 

endothelial cell 
endothelial cell 



Contig 

h6.1 

h6.1 

h6.1 

h6.1 

h6.1 

h6.1 

h6.1 
h6.2 

h6.1 

h6.1 
h6.2 

h6*l 
h6.1 

h6.1 

h6.1 

h6.1 



Table 7.1 

Summary of ESTs derived from mouse SOCS-7 cDNAs 



SOCS 
SOCS-7 



Species 
Mouse 



EST name 

mj39a01 

vi52h07 



End 

5' 
5' 



EST no 

EST0665627 
EST1 267404 



Library source Contig 
dl 3.5/14.5 embryo m7.1 
d7.5 embryo m7. 1 
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Table 7.2 

Summary of ESTs derived from human SOCS-5 cDNAs 



SOCS Species EST name 
SOCS-7 HUMAN STS WI-30171 
EST00939 
EST12913 
yc29b05 
yp49fl0 
rtl0f03 

zx73g04 



End EST no 

(G21563) 



Library source Contig 
Chromosome 2 h7.2 



5* EST0000906 hippocampus h7. 1 

3* EST0944382 uterus h7.2 

y ESTO 128727 liver h7.2 

3* EST0301 9 1 4 retina h7.2 

5' EST0922932 germinal centre h7.2 
Bcell 

3* EST0921231 h7.1 

3* EST1 102975 ovarian tumour h7.1 



Table 8.1 

Summary of ESTs derived from mouse SOCS-8 cDNAs 



SOCS Species EST name End 

SOCS-8 Mouse mjl6e09 rl 
vj27a029 rl 



EST no Library source Contig 

EST0666240 d 1 3.5/14.5 embryo m8. 1 
EST1 155973 heart m8.1 



Table 9.1 

Summary of ESTs derived from mouse SOCS-9 cDNAs 
SOCS Species EST name End EST no 

Mouse me65d05 5* EST0585211 



Library source Contig 
d 13.5/14.5 embryo m9.1 



Table 93. 

Summary of ESTs derived from human SOCS-5 cDNAs 



SOCS Species EST name 
SOCS-9 Human CSRL-83f2-u 
EST1 14054 



End EST no 
(B06659) 



5' EST0939759 placenta 
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yy06b07 
yy06g06 
zr40c09 

zr72h01 

yx92c08 
yx93b08 
hfc0662 



3' 
5* 
5' 

5' 

y 

5' 
5' 
5' 



EST0434504 melanocyte h9. 1 

EST0443783 melanocyte h9. 1 

EST0832461 melanocyte, heart, h9.1 
uterus 

EST0892025 melanocyte, heart, h9.1 
uterus 

h9.1 



EST0892026 

EST0441160 melanocyte 

EST044 1 260 melanocyte 

EST08896 1 1 foetal heart 



h9.1 
h9.1 
h9.1 



Table 10.1 

Summary of ESTs derived from mouse SOCS-10 cDNAs 



SOCS Species EST name 
Mouse mbl4dl2 
mb40f06 
mg89bl 1 
mq89el2 
mp03gl2 
vh53cll 



End EST no 

5* EST0549887 

5' EST05 15064 

5' EST0630631 

5' EST0776015 

5* EST0741991 

5* EST1 154634 



Library source 
dl9.5 embryo 
dl9-5 embryo 
dl3.5-14.5 embryo 
heart 
heart 

mammary gland 



Table 10.2 

Summary of ESTs derived from human SOCS-5 cDNAs 
SOCS Species EST name End EST no 
SOCS-10 Human aa48h!0 3' EST1I35220 
zp35h01 3* EST0819137 
zp97hl2 5* EST0835442 



Library source 
germinal centre B cell 
muscle 
muscle 



Contig 

mlO.l 

mlO.l 

mlO.l 

mlO.l 

ml 0.1 

mlO.l 



Contig 
hl0.2 
hl0.2 
hi 0.2 
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zq08h01 
zr34g05 

EST73000 



3' 
5' 
5' 
3' 
5 



HSDHEI005 ? 



EST0831211 
EST0835907 
EST0834251 
EST0834440 
ESTI 004491 
EST0O139O6 



Table 11.1 

Summary of ESTs derived from human SOCS-5 cDNAs 



SOCS Species EST name End 
SOCS-11 Human 2t24h06 rl 



zr43b02 



rl 
si 



EST no 

EST0925023 

EST0873006 
EST0872954 



muscle 

melanocyte, heart* 
uterus 



ovary 
heart 

Library source 

ovarian tumor 



hl0.2 
hlO.l 
hi 0.2 
hl0.2 
hl0.2 
hi 0.2 



Table 12.1 

Summary of ESTs derived from mouse SOCS-12 cDNAs 



Contig 
11.1 



melanocyte, heart, uterus 11.1 
11.1 



SOCS 



Species EST name End EST no 



library source 



Contig 



SOCS-12 Mouse EST03803 5* EST1054173 

mtl8f02 5' EST0817652 

mz60gl0 5 • EST0890872 

vaOScll 5 • EST0909449 



day 7.5 emb ml2 .1 

ectoplacental 

cone 

3NbMS spleen ml2 .1 

lymph node ml2 . 1 

lymph node m!2 . 1 



Table 12J> 

Summary of ESTs derived from human SOCS-5 cDNAs 



SOCS Species EST name 



End EST no Library source Contig 



SOCS-12 Human STS-SHGC- 13867 



ESTI 77695 



Chromosome 2 hi 2.2 



5' EST0948071 Jurkat cells 



hlZl 
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EST64550 
EST76868 
PMY2369 
yb38f04 

yg74el2 
yhl3g04 

yh48b06 
yh53a05 

yn48h09 

yn90a09 
yo08f03 

yolleOl 
yo63b!2 

yq56g02 
zh57c04 
zh79h01 
zh99all 
zo92hl2 

zs48c01 



5' 

5' 

5' 

5 f 
3* 

5* 

5' 
3' 

5* 

5* 
3' 

5' 
3' 

3' 

5* 
3* 

3* 

5' 

3\ 

3' 

y 
y 
y 

5' 
3' 

5' 
3* 



EST0997367 Jurkat cells 

EST100729I pineal body 

EST11 15998 KG-1 

ESTO 1 08807 foetal spleen 



EST0224407 

EST0237226 
ESTO236992 

yh48b06 

EST01 97282 
EST0197486 

EST0278258 
EST0278259 

EST0302557 

ESTO301790 
EST0302059 

? none found 

EST0303606 
EST0304085 

EST0346935 

EST0594201 

EST0598945 

EST06 18570 

EST0803392 
EST0803393 

ESTO925714 
EST0925530 



d73 brain 
d73 brain 

placenta 
placenta 

brain 

brain 
brain 



breast 

foetal liver spleen 
foetal liver spleen 
foetal liver spleen 
foetal liver spleen 
ovarian cancer 



germinal centre 
Bcell 



hl2.1 

hi 2.2 

hl2.1 

hi 2.1 
hi 2.2 

hi 2.1 

hlZl 
hl2.2 

hi 2.2 

hl2.2 
hi 2.2 

hl2.2 
hl2.2 

hl2.2 

hl2.2 
hi 2.2 

hl2.2 

hi 2.2 
hl2.2 

hl2.1 

hi 2.2 

hl2.2 

hi 2.2 

hlZl 
hl2.2 

hl2.1 
hi 2.2 
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zs45h02 



3* EST0932296 germinal centre hi 2.2 
Bcell 



Table 13.1 

Summary of ESTs derived from mouse SOCS-13 cDNAs 



SOCS 



Species EST name End EST no 



Library source 



Contig 



SOCS-13 Mouse 



ma39c09 5' EST0517875 day 19.5 embryo m!3.1 

me60c05 5* EST0584950 day 13.5/14.5 embryo ml3.1 

mi78g05 5' EST0653834 day 19.5 embryo ml3.1 

mklOcll 5* EST0735I58 day 19.5 embryo ml3.1 

mo48gl2 5* EST0745111 day 10.5 embryo ml3.1 

mp94a01 5* EST0762827 thymus ml 3.1 

vb57c07 5* EST 1028976 day 11.5 embryo ml3.1 

vh07cll 5* EST1 117269 mammary gland m!3.1 



Table 13.2 

Summary of ESTs derived from human SOCS-13 cDNAs 



SOCS Species EST name End EST no 



Library source Contig 



SOCS-13 Human EST59161 5' EST0992726 infant brain 
Table 14.1 

Summary of ESTs derived from mouse SOCS-14 cDNAs 



SOCS 



Species EST name End EST no 



hl3.1 



Library source Contig 



SOCS-14 mouse 



mi75e03 5* EST0651892 dl9.5 embryo m!4.1 

vd29hll 5- EST1067080 2 cell embryo ml4.1 
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vd53g07 



EST1 1 19627 2 cell embryo ml4.1 



Table 15.1 

Summary of ESTs derived from mouse SOCS-15 cDNAs 



SOCS 



Species EST name End EST no 



Library source Contig 



SOCS-15 Mouse 



mh29b05 

mh98h09 

m!45a02 

mu43al0 

my38c09 

vj37h07 

AC002393 



5' 
5* 
5* 
5' 
5' 
5* 



EST0628834 placenta 
EST0638243 placenta 



EST0687171 testis 
EST851588 thymus 
EST878461 



ml5.1 
mI5.1 
mlS.l 
ml5.1 

pooled organs ml 5 . 1 



EST1 17479 1 diaphragm 



ml5. 



Chromosome 6 ml 5. 1 
BAC 



Table 15.2 

Summary of ESTs derived from human SOCS-15 cDNAs 



SOCS Species 
SOCS-15 Human 



EST name End EST no Library source Contig 

EST98889 5' EST1026568 thyroid hl5.1 
3* EST1 138057 colon tumour hl5.1 



ne48bo5 
yb!2hl2 

HSU47924 



EST0098885 placenta hi 5.1 

EST0098886 hlS.l 

Chromosome 12 hl5.1 
BAC 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: (Other than US) THE WALTER AND ELIZA HALL INSTITUTE OF 

MEDICAL RESEARCH 
(US Only) 

(ii) TITLE OF INVENTION: THERAPEUTIC AND DIAGNOSTIC AGENTS 

(iii) NUMBER OF SEQUENCES: 49 
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(A) ADDRESSEE: DA VIES COLLISON CAVE 

(B) STREET: 1 LITTLE COLLINS STREET 
(Q CITY: MELBOURNE 

(D) STATE: VICTORIA 

(E) COUNTRY: AUSTRALIA 

(F) ZIP: 3000 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(Q OPERATING SYSTEM: PC-DOS/MS-DOS 
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(Q REFERENCE/DOCKET NUMBER: EJH/EK 

(ix) TELECOMMUNICATION INFORMATION: 
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(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
<B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



CACGCCGCCC ACGTGAAGGC 



20 



(2) INFORMATION FOR SEQ ID NO: 2: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
TTCGCCAATG ACAAGACGCT 



20 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 123 6 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

<A) NAME /KEY: CDS 

(B) LOCATION: 1..636 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



CGAGGCTCAA GCTCCGGGCG GATTCTGCGT GCCGCTCTCG CTCCTTGGGG TCTGTTGGCC 
GGCCTGTGCC ACCCGGACGC CCGGCTCACT GCC TCTGTCT CCCCCATCAG CGCAGCCCCG 
GACGCTATGG CCCACCCCTC CAGCTGGCCC CTCGAGTAGG 



-101 
-41 
-1 



ATG GTA GCA CGC AAC CAG GTG GCA GCC GAC AAT GCG ATC TCC CCG GCA 
Met Val Ala Arg Asn Gin Val Ala Ala Asp Asn Ala lie Ser Pro Ala 
1 5 10 15 

GCA GAG CCC CGA CGG CGG TCA GAG CCC TCC TCG TCC TCG TCT TCG TCC 
Ala Glu Pro Arg Arg Arg Ser Glu Pro Ser Ser Ser Ser Ser Ser Ser 
20 25 30 

TCG CCA GCG GCC CCC GTG. CGT CCC CGG CCC TGC CCG GCG GTC CCA GCC 



48 



96 



144 
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Ser Pro Ala Ala Pro Val Arg Pro Arg Pro Cys Pro Ala Val Pro Ala 
35 40 . 45 

CCA GCC CCT GGC GAC ACT CAC TTC CGC ACC TTC CGC TCC CAC TCC GAT 192 
Pro Ala Pro Gly Asp Thr His Phe Arg Thr Phe Arg Ser His Ser Asp 
50 55 60 

TAC CGG CGC ATC ACG CGG ACC AGC GCG CTC CTG GAC GCC TGC GGC TTC 240 
Tyr Arg Arg lie Thr Arg Thr Ser Ala Leu Leu Asp Ala Cys Gly Phe 
65 70 75 80 

TAT TGG GGA CCC CTG AGC GTG CAC GGG GCG CAC GAG CGG CTG CGT GCC 288 
Tyr Trp Gly Pro Leu Ser Val His Gly Ala His Glu Arg Leu Arg Ala 
85 90 95 

GAG CCC GTG GGC ACC TTC TTG GTG CGC GAC AGT CGT CAA CGG AAC TGC 33 6 

Glu Pro Val Gly Thr Phe Leu Val Arg Asp Ser Arg Gin Arg Asn Cys 
100 105 110 

TTC TTC GCG CTC AGC GTG AAG ATG GCT TCG GGC CCC ACG AGC ATC CGC 384 
Phe Phe Ala Leu Ser Val Lys Met Ala Ser Gly Pro Thr Ser lie Arg 
115 120 125 

GTG CAC TTC CAG GCC GGC CGC TTC CAC TTG GAC GGC AGC CGC GAG ACC 432 
Val His Phe Gin Ala Gly Arg Phe His Leu Asp Gly Ser Arg Glu Thr 
130 135 140 

TTC GAC TGC CTT TTC GAG CTG CTG GAG CAC TAC GTG GCG GCG CCG CGC 480 
Phe Asp Cys Leu Phe Glu Leu Leu Glu His Tyr Val Ala Ala Pro Arg 
145 150 155 160 

CGC ATG TTG GGG GCC CCG CTG CGC CAG CGC CGC GTG CGG CCG CTG CAG 528 
Arg Met Leu Gly Ala Pro Leu Arg Gin Arg Arg Val Arg Pro Leu Gin 
165 170 175 

GAG CTG TGT CGC CAG CGC ATC GTG GCC GCC GTG GGT CGC GAG AAC CTG 576 
Glu Leu Cys Arg Gin Arg He Val Ala Ala Val Gly Arg Glu Asn Leu 
180 185 190 

GCG CGC ATC CCT CTT AAC CCG GTA CTC CGT GAC TAC CTG AGT TCC TTC 624 
Ala Arg lie Pro Leu Asn Pro Val Leu Arg Asp Tyr Leu Ser Ser Phe 
195 200 205 

CCC TTC CAG ATC TGA CCGGCTG CCGCTGTGCC GCAGCATTAA GTGGGGGCGC 676 
Pro Phe Gin lie * 
210 

CTTATTATTT CTTATTATTA ATTATTATTA TTTTTCTGGA ACCACGTGGG AGCCCTCCCC 736 

GCCTGGGTCG GAGGGAGTGG TTGTGGAGGG TGAGATGCCT CCCACTTCTG GCTGGAGACC 796 

TCATCCCACC TCTCAGGGGT GGGGGTGCTC CCCTCCTGGT GCTCCCTCCG GGTCCCCCCT 856 

GGTTGTAGCA GCTTGTGTCT GGGGCCAGGA CCTGAATTCC ACTCCTACCT CTCCATGTTT 916 

ACATATTCCC AGTATCTTTG CACAAACCAG GGGTCGGGGA GGGTCTCTGG CTTCATTTTT 976 

CTGCTGTGCA GAATATCCTA TTTTATATTT TTACAGCCAG TTTAGGTAAT AAACTTTATT 1036 

ATGAAAGTTT TTTTTTAAAA GAAAAAAAAA AAAAAAAAA 1075 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH': 212 amino acids 
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(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Val Ala Arg Asn Gin Val Ala Ala Asp Asn Ala lie Ser Pro Ala 
15 10 15 

Ala Glu Pro Arg Arg Arg Ser Glu Pro Ser Ser Ser Ser Ser Ser Ser 
20 25 30 

Ser Pro Ala Ala Pro Val Arg Pro Arg Pro Cys Pro Ala Val Pro Ala 
35 40 45 

Pro Ala Pro Gly Asp Thr His Phe Arg Thr Phe Arg Ser His Ser As© 
50 55 60 

Tyr Arg Arg lie Thr Arg Thr Ser Ala Leu Leu Asp Ala Cys Gly Phe 
65 70 75 80 

Tyr Trp Gly Pro Leu Ser Val His Gly Ala His Glu Arg Leu Arg Ala 
85 90 95 

Glu Pro Val Gly Thr Phe Leu Val Arg Asp Ser Arg Gin Arg Asn Cys 
100 105 no 

Phe Phe Ala Leu Ser Val Lys Met Ala Ser Gly Pro Thr Ser lie Arg 
115 120 125 

Val His Phe Gin Ala Gly Arg Phe His Leu Asp Gly Ser Arg Glu Thr 
130 135 140 

Phe Asp Cys Leu Phe Glu Leu Leu Glu His Tyr Val Ala Ala Pro Ara 
145 150 155 160 

Arg Met Leu Gly Ala Pro Leu Arg Gin Arg Arg Val Arg Pro Leu Gin 
165 170 175 

Glu Leu Cys Arg Gin Arg lie Val Ala Ala Val Gly Arg Glu Asn Leu 
180 185 190 

Ala Arg lie Pro Leu Asn Pro Val Leu Arg Asp Tyr Leu Ser Ser Phe 
195 200 205 

Pro Phe Gin lie 
210 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1121 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 223.. 819 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GCGATCTGTG GGTGACAGTG TCTGCGAGAG ACTTTGCCAC ACCATTCTGC CGGAATTTGG 60 

AGAAAAAGAA CCAGCCGCTT CCAGTCCCCT CCCCCTCCGC C AC CATTTCG GACACCCTGC 120 

ACACTCTCGT TTTGGGGTAC CCTGTGACTT CCAGGCAGCA CGCGAGGTCC ACTGGCCCCA 180 

GCTCGGGCGA CCAGCTGTCT GGGACGTGTT GACTCATCTC CC ATG ACC CTG CGG 234 

Met Thr Leu Arg 
1 

TGC CTG GAG CCC TCC GGG AAT GGA GCG GAC AGG ACG CGG AGC CAG TGG 282 
Cys Leu Glu Pro Ser Gly Asn Gly Ala Asp Arg Thr Arg Ser Gin Trp 
5 10 15 20 

GGG ACC GCG GGG TTG CCG GAG GAA CAG TCC CCC GAG GCG GCG CGT CTG 330 
Gly Thr Ala Gly Leu Pro Glu Glu Gin Ser Pro Glu Ala Ala Arg Leu 
25 30 35 

GCG AAA GCC CTG CGC GAG CTC AGT CAA ACA GGA TGG TAC TGG GGA AGT 378 
Ala Lys Ala Leu Arg Glu Leu Ser Gin Thr Gly Trp Tyr Trp Gly Ser 
40 45 50 

ATG ACT GTT AAT GAA GCC AAA GAG AAA TTA AAA GAG GCT CCA GAA GGA 426 
Met Thr Val Asn Glu Ala Lys Glu Lys Leu Lys Glu Ala Pro Glu Gly 
55 60 65 

ACT TTC TTG ATT AGA GAT AGT TCG CAT TCA GAC TAC CTA CTA ACT ATA 474 
Thr Phe Leu He Arg Asp Ser Ser His Ser Asp Tyr Leu Leu Thr lie 
70 75 80 

TCC GTT AAG ACG TCA GCT GGA CCG ACT AAC CTG CGG ATT GAG TAC CAA 522 
Ser Val Lys Thr Ser Ala Gly Pro Thr Asn Leu Arg He Glu Tyr Gin 
85 90 95 100 

GAT GGG AAA TTC AGA TTG GAT TCT ATC ATA TGT GTC AAG TCC AAG CTT 570 
Asp Gly Lys Phe Arg Leu Asp Ser He He Cys Val Lys Ser Lys Leu 
105 110 115 

AAA CAG TTT GAC AGT GTG GTT CAT CTG ATT GAC TAC TAT GTC CAG ATG 618 
Lys Gin Phe Asp Ser Val Val His Leu He Asp Tyr Tyr Val Gin Met 
120 125 130 

TGC AAG GAT AAA CGG ACA GGC CCA GAA GCC CCA CGG AAT GGG ACT GTT 666 
Cys Lys Asp Lys Arg Thr Gly Pro Glu Ala Pro Arg Asn Gly Thr Val 
135 140 145 

CAC CTG TAC CTG ACC AAA CCT CTG TAT ACA TCA GCA CCC ACT CTG CAG 714 
His Leu Tyr Leu Thr Lys Pro Leu Tyr Thr Ser Ala Pro Thr Leu Gin 
150 155 160 

CAT TTC TGT CGA CTC GCC ATT AAC AAA TGT ACC GGT ACG ATC TGG GGA 762 
His Phe Cys Arg Leu Ala He Asn Lys Cys Thr Gly Thr He Trp Gly 
165 170 175 180 

CTG CCT TTA CCA ACA AGA CTA AAA GAT TAC TTG GAA GAA TAT AAA TTC 810 
Leu Pro Leu Pro Thr Arg Leu Lys Asp Tyr Leu Glu Glu Tyr Lys Phe 
185 190 195 

CAG GTA TAAGTATTTC TCTCTCTTTT TCGTTTTTTT TTAAAAAAAA AAAAACACAT 866 
Gin Val 

GCCTCATATA GACTATCTCC GAATGCAGCT ATGTGAAAGA GAACCCAGAG GCCCTCCTCT 92 6 

GGATAACTGC GCAGAATTCT CTCTTAAGGA CAGTTGGGCT CAGTCTAACT TAAAGGTGTG 986 
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AAGATGTAGC TAGGTATTTT AAAGTTCCCC TTAGGTAGTT TTAGCTGAAT GATGCTTTCT 1046 
TTCCTATGGC TGCTCAAGAT CAAATGGCCC TTTTAAATGA AACAAAACAA AACAAAACAA 1106 
AAAAAAAAAA AAAAA 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 198 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xx) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Thr Leu Arg Cys Leu Glu Pro Ser Gly Asn Gly Ala Asp Arg Thr 
15 10 15 

Arg Ser Gin Trp Gly Thr Ala Gly Leu Pro Glu Glu Gin Ser Pro Glu 
20 25 30 

Ala Ala Arg Leu Ala Lys Ala Leu Arg Glu Leu Ser Gin Thr Gly Trn 
35 40 45 

Tyr Trp Gly Ser Met Thr Val Asn Glu Ala Lys Glu Lys Leu Lys Glu 
50 55 60 

Ala Pro Glu Gly Thr Phe Leu lie Arg Asp Ser Ser His Ser Asp Tvr 
65 70 75 80 

Leu Leu Thr He Ser Val Lys Thr Ser Ala Gly Pro Thr Asn Leu Arg 
85 90 95 

He Glu Tyr Gin Asp Gly Lys Phe Arg Leu Asp Ser lie lie Cys Val 
100 105 no 

Lys Ser Lys Leu Lys Gin Phe Asp Ser Val Val His Leu He Ast> Tvr 
115 120 125 

Tyr Val Gin Met Cys Lys Asp Lys Arg Thr Gly Pro Glu Ala Pro Arg 
130 135 140 

Asn Gly Thr Val His Leu Tyr Leu Thr Lys Pro Leu Tyr Thr Ser Ala 
145 150 155 160 

Pro Thr Leu Gin His Phe Cys Arg Leu Ala He Asn Lys Cys Thr Glv 
165 170 175 

Thr He Trp Gly Leu Pro Leu Pro Thr Arg Leu Lys Asp Tyr Leu Glu 
180 185 190 

Glu Tyr Lys Phe Gin Val 
195 

(2) INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2187 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(ix) FEATURE: 

(A) NAME/KEY: CDS 

fB) LOCATION: 18.. 695 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7; 

CGCTGGCTCC GTGCGCC ATG GTC ACC CAC AGC AAG TTT CCC GCC GCC GGG 50 
Met Val Thr His Ser Lys Phe Pro Ala Ala Gly 
15 10 

ATG AGC CGC CCC CTG GAC ACC AGC CTG CGC CTC AAG ACC TTC AGC TCC 98 
Met Ser Arg Pro Leu Asp Thr Ser Leu Arg Leu Lys Thr Phe Ser Ser 
15 20 25 

AAA AGC GAG TAC CAG CTG GTG GTG AAC GCC GTG CGC AAG CTG CAG GAG 146 
Lys Ser Glu Tyr Gin Leu Val Val Asn Ala Val Arg Lys Leu Gin Glu 
30 35 40 

AGC GGA TTC TAC TGG AGC GCC GTG ACC GGC GGC GAG GCG AAC CTG CTG 194 
Ser . Gly Phe Tyr Trp Ser Ala Val Thr Gly Gly Glu Ala Asn Leu Leu 
45 50 55 

CTC AGC GCC GAG CCC GCG GGC ACC TTT CTT ATC CGC GAC AGC TCG GAC 242 
Leu Ser Ala Glu Pro Ala Gly Thr Phe Leu lie Arg Asp Ser Ser Asp 
60 65 70 75 

CAG CGC CAC TTC TTC ACG TTG AGC GTC AAG ACC CAG TCG GGG ACC AAG 290 
Gin Arg His Phe Phe Thr Leu Ser Val Lys Thr Gin Ser Gly Thr Lys 
80 85 90 

AAC CTA CGC ATC CAG TGT GAG GGG GGC AGC TTT TCG CTG CAG AGT GAC 338 
Asn Leu Arg He Gin Cys Glu Gly Gly Ser Phe Ser Leu Gin Ser Asp 
95 100 105 

CCC CGA AGC ACG CAG CCA GTT CCC CGC TTC GAC TGT GTA CTC AAG CTG 386 
Pro Arg Ser Thr Gin Pro Val Pro Arg Phe Asp Cys Val Leu Lys Leu 
110 115 120 

GTG CAC CAC TAC ATG CCG CCT CCA GGG ACC CCC TCC TTT TCT TTG CCA 434 
Val His His Tyr Met Pro Pro Pro Gly Thr Pro Ser Phe Ser Leu Pro 
125 130 135 

CCC ACG GAA CCC TCG TCC GAA GTT CCG GAG CAG CCA CCT GCC CAG GCA 482 
Pro Thr Glu Pro Ser Ser Glu Val Pro Glu Gin Pro Pro Ala Gin Ala 
140 145 150 155 

CTC CCC GGG AGT ACC CCC AAG AGA GCT TAC TAC ATC TAT TCT GGG GGC 530 
Leu Pro Gly Ser Thr Pro Lys Arg Ala Tyr Tyr lie Tyr Ser Gly Gly 
160 165 170 

GAG AAG ATT CCG CTG GTA CTG AGC CGA CCT CTC TCC TCC AAC GTG GCC 578 
Glu Lys He Pro Leu Val Leu Ser Arg Pro Leu Ser Ser Asn Val Ala 
175 180 185 

ACC CTC CAG CAT CTT TGT CGG AAG ACT GTC AAC GGC CAC CTG GAC TCC 626 
Thr Leu Gin His Leu Cys Arg Lys Thr Val Asn Gly His Leu Asp Ser 
190 195 200 

TAT GAG AAA GTG ACC CAG CTG CCT GGA CCC ATT CGG GAG TTC CTG GAT 674 
Tyr Glu Lys Val Thr Gin Leu Pro Gly Pro He Arg Glu Phe Leu Asp 
205 210 215 

CAG TAT GAT GCT CCA CTT TAAGGAGCAA AAGGGTCAGA GGGGGGCCTG 722 
Gin Tyr Asp Ala Pro Leu 
220 225 
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GGTCGGTCGG 


TCGCCTCTCC 


TCCGAGGCAC 


ATGGCACAAG 


CACAAAAATC 


CAGCC C C AAC 


782 


GGTCGGTAGC 


TCCCAGTGAG 


CCAGGGGCAG 


ATTGGCTTCT 


TCCTCAGGCC 


CTCCACTTT'P 




GCAGAGTAGA 


GCTGGCAGGA 


CCTGGAATTC 


GTCTGAGGGG 


AGGGGGAGCT 


CTPAPrTfiPT 

UUL/ILL XVjV_ X 




TTCCCCCCTC 


CCCCAGCTCC 


AGCTTCTTTC 


AAGTGGAGCC 


AGCCGGC CTG 


X \J\J X\3\3\S 




ACAATACCTT 


TGACAAGCGG 


ACTCTCCCCT 


CCCCTTCCTC 


^•nVffiVi^aXiVxv X 


X\jV_ i 1 V_t_V_A 


1022 


AGGGAGGTGG 


GGACACCTCC 


AAGTGTTGAA 


C TTAGAACTG 




1 ILAAAL IT 


1082 


TCCC GCTGGA 


ACTTGTTTGC 


GCTTTGATTT 


GGTTTGATCA 


Aft Af2P Aftftr* A 




1142 


GGATGGAAGA 


GAAAAGGGTG 


TGTGAAGGGT 


TTTTATGCTG 


GPPVAA Af2A A A 


x AAL V~AC xaJCJ 


1202 


CACTGCCCAA 


CCTAGGTGAG 


GAGTGGTGGC 


TCCTGGCTCT 




^(-AAvtUCsGTG 


1262 


AC CTGAAGAG 


AGCTATACTG 


GTGCCAGGCT 


CCTCTCCATG 




1\jAAACCTTCG 


1322 


CAGATCCCTT 


GCACCCCAGA 


ACCCTCCCCG 


TTGTRAAfiAfi 


V»V»AO x AVjl, A 1 




1382 


GACAGATGAG 


GC TGGTGAGC 


TGGCCGCC TT 


A A V— V/UT\./\V\_ 




Afc»A IvJAACAG 


1442 


ATGAGCCATC 


. TTGGAGCCCA 


GGTTTCCCCT 

«w A A A. WWVpV* A 




uAuuul X v_ 


CI I IXsTCTCT 


1502 


CCTATGTGGG 


GCTAGGAGAC 


TCGCCTTAAA 


TGCCCTC TGT 


r*PP Af2f2ft A 1Y2 


wuAl .xLKjL.A 


1562 


CACAAGGAGC 


CAAACACAGC 


CAATAGGCAG 


AGAGTTGAGG 


fiATTPAPPPA 
Vwii l\^AL.Ul,ii 


fcA* JL TAlA 


1622 




Av J. J\ 


LA^LAjALxAIjAI- 


CCAGTCACTC 


CAGGAGACTC 


CTGAGTTAAC 


1682 


ACTGGGAAGA 


CATTGGCCAG 


TCCTAGTCAT 


CTCTCGGTCA 


GTAGGTCCGA 


GAGCTTCCAG 


1742 


GCCCTGCACA 


GCCCTCCTTT 


CTCACCTGGG 


GGGAGGCAGG 


AGGTGATGGA 


GAAGCCTTCC 


1802 


CATGCCGCTC 


AC AGGGGCC T 


CACGGGAATG 


CAGCAGCCAT 


GCAATTACCT 


GGAACTGGTC 


1862 


CTGTGTTGGG 


GAGAAACAAG 


TTTTCTGAAG 


TCAGGTATGG 


GGCTGGGTGG 


GGCAGCTGTG 


1922 


TGTTGGGGTG 


GCTTTTTTCT 


CTCTGTTTTG 


AATAATGTTT 


ACAATTTGCC 


TCAATCACTT 


1982 


TTATAAAAAT 


CCACCTCCAG 


CCCGCCCCTC 


TCCCCACTCA 


GGCCTTCGAG 


GCTGTCTGAA 


2042 


GATGCTTGAA 


AAACTCAACC 


AAATCCCAGT 


TCAACTCAGA 


CTTTGCACAT 


ATATTTATAT 


2102 


TTATACTCAG 


AAAAGAAACA 


TTTCAGTAAT 


TTATAATAAA 


AGAGCACTAT 


TTTTTAATGA 


2162 


AAAAAAAAAA 


AAAAAAAAAA 


AAAAA 








2187 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 225 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Met Val Thr His Ser Lys Phe Pro Ala Ala Gly Met Ser Arg Pro Leu 
1 5 10 15 

Asp Thr Ser Leu Arg Leu Lys Thr Phe Ser Ser Lys Ser Glu Tyr Gin 
50 25 30 
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Leu Val Val Asn Ala Val Arg Lys Leu Gin Glu Ser Gly Phe Tyr Trp 
35 40 45 

Ser Ala Val Thr Gly Gly Glu Ala Asn Leu Leu Leu Ser Ala Glu Pro 
50 55 60 

Ala Gly Thr Phe Leu lie Arg Asp Ser Ser Asp Gin Arg His Phe Phe 
65 70 75 80 

Thr Leu Ser Val Lys Thr Gin Ser Gly Thr Lys Asn Leu Arg He Gin 
85 90 95 

Cys Glu Gly Gly Ser Phe Ser Leu Gin Ser Asp Pro Arg Ser Thr Gin 
100 105 110 

Pro Val Pro Arg Phe Asp Cys Val Leu Lys Leu Val His His Tyr Met 
115 120 125 

Pro Pro Pro Gly Thr Pro Ser Phe Ser Leu Pro Pro Thr Glu Pro Ser 
130 135 140 

Ser Glu Val Pro Glu Gin Pro Pro Ala Gin Ala Leu Pro Gly Ser Thr 
145 150 155 160 

Pro Lys Arg Ala Tyr Tyr He Tyr Ser Gly Gly Glu Lys He Pro Leu 
165 170 175 

Val Leu Ser Arg Pro Leu Ser Ser Asn Val Ala Thr Leu Gin His Leu 
180 185 190 

Cys Arg Lys Thr Val Asn Gly His Leu Asp Ser Tyr Glu Lys Val Thr 
195 200 205 

Gin Leu Pro Gly Pro lie Arg Glu Phe Leu Asp Gin Tyr Asp Ala Pro 
210 215 220 

Leu 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1094 base pairs 

(B) TYPE: nucleic acid 
(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



CTCCGGCTGG 




AGGATGGTAG 


CACACAACCA 


GGTGGCAGCC 


GACAATGCAG 


60 


TCTCCACAGC 


AGCAGAGCCC 


CGACGGCGGC 


CAGAACCTTC 


CTCCTCTTCC 


TCCTCCTCGC 


120 


CCGCGGCCCC 


CGCGCGCCCG 


CGGCCGTGCC 


CCGCGGTCCC 


GGCCCCGGCC 


CCCGGCGACA 


180 


CGCACTTCCG 


CACATTCCGT 


TCGCACGCCG 


ATTACCGGCG 


CATCACGCGC 


GCCAGCGCGC 


240 


TCCTGGACGC 


CTGCGGATTC 


TACTGGGGGC 


CCCTGAGCGT 


GCACGGGGCG 


CACGAGCGGC 


300 


TGCGCGCCGA 


GCCCGTGGGC 


ACCTTCCTGG 


TGCGCGACAG 


CCGCCAGCGG 


AACTGCTTTT 


360 


TCGCCCTTAG 


CGTGAAGATG 


GCCTCGGGAC 


CCACGAGCAT 


CCGCGTGCAC 


TTTCAGGCCG 


420 


GCCGCTTTCA 


CCTGGATGGC 


AGCCGCGAGA 


GCTTCGACTG 


CCTCTTCGAG 


CTGCTGGAGC 


480 



225 
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CAAGGCCAGG CCGAGTGGCC AACGGGAGGG GCCCGCGCGC GATTCTGGAG GAGGGCGGCG 600 

GCCCCACAGG TCTCCAGGGC TGGCTAGCCG GGCTCCTAGA GCGGAGACTG CCAAGGCCTT 660 

CGGGTCCTGG GCAGGAAGGA TCCTGGCAGG GAGGAGTTGC TTGGGGGGTG GGGGGGAAAG 720 

GCTCCAGGCG CGGTGGAGCT CTGACCAGGA GAATGCACAC ACTCGGAGGG GAGGAGGCGT 780 

GTCAGCCCCA AGCTAGCATC CCACCCGGGG AGCAGCGATG TGGGGCGAAG GTAGCCAGAG 840 

CAAAAGAGCA GGCACCAGGT GACACGAAAC AGAAGATTCC GGGTAGAGCC AGAACCCCAG 900 

AAGTCCCATT CAGGGAAGGT GCGAGGCGAG AACGAGTTAG GTGGACCCTC TCCAGGGGCA 960 

GC C AAAG AAA TC T AAAG AG A ACCCGAAGGA CTTGCCGGAA AG AG AAACC G AAAGCGGCGG 102 0 

TGGGCGGGAT CGGTGGGCGG GGCCTCCCTG GTTTAAGAGC TTGATGCAGG GGCGGGCAGC 1080 

AGCAGAGAGA ACTGCGGCCG TGGCAGCGGC ACGGCTCCCG GCCCCGGAGC ATGCGCGACA 1140 

GCAGCCCCGG AACCCCCAGC CGCGGCGCCC CGCGTCCCGC CGCCAGGTGA GCCGAGGCAG 1200 

CTGCGAAGGA GCAGGCGGGA GGGGATGGGA GGAAGGGGAG CAGAGCCTGG CAGGACTATC 1260 

CTCGCAGACT GCATGGCGGG GTCGTGGATG CTATGCCTCT GGCGCCCGCC CCACCGGCTG 1320 

GC CCAGGCGG CCCCTCGCGC GCGCGGGGCG CCGTCAGCCC CTCCTCTCCG GCCCTGAGCC 1380 

CGGATCGTCC GCCCGGGTTC CAGTTCCCGG CGTGGCCAGT AGGCGGCAAC CGCGAGGCGG 1440 

CAAGCCACCC AGCGGGGACG GC CTGGAGTC GGGCCCCTCT CCACGCCCCC TTCTCCACGC 1500 

GCGCGGGGAG GCAGGGCTCC ACCGCCAGTC TGGAAGGGTT C CAC AT AC AG GAACGG CCTA 1560 

CTTCGCAGAT GAGCCCACCG AGGCTCAGGC TCCGGGCGGA TTCTGCGTGT CACCCTCGCT 1620 

CCTTGGGGTC CGC TGGCCGG CCTGTGCCAC CCGGACGCCC GGTTCACTGC CTCTGTCTCC 1680 

CCCATCAGCG CAGCCCCGGA CGCTATGGCC CACCCCTCCA GCTGGCCCCT CGAGTAGGAT 1740 

GGTAGCACGT AACCAGGTGG AAGCCGACAA TGCGATCTCC CCGGCATCAG AGCCCCGACG 1800 

GCGGCCAGAG CCATCCTCGT CCTCGTCTTC GTCCTCGCCG GCGGCCCCGG CGCGTCCCCG 1860 

GCCCTGCCCG GTGGTCC CGG CCCCGGCTCC GGGCGACACT CACTTCCGCA CCTTCCGCTC 1920 

CCACTCTGAT TACCGGCGCA TCACGCGGAC CAGCGCTCTC CTGGACGCCT GCGGCTTCTA 1980 

CTGGGGACCC CTGAGCGTGC ATGGGGCGCA CGAACGGCTG CGTTCCGAAC CCGTGGGCAC 2040 

CTTCTTGGTG CGCGACAGTC GCCAGCGGAA CTGCTTCTTC GCGCTCAGCG TGAAGATGGC 2100 

TTCGGGCCCC ACGAGCATTC GTGTGCACTT CCAGGCCGGC CGCTTCCACC TGGACGGCAA 2160 

CCGCGAGACC TTCGACTGCC TCTTCGAGCT GCTGGAGCAC TACGTGGCGG CGCCGCGCCG 2220 

CATGTTGGGG GCCCCACTGC GCCAGCGCCG CGTGCGGCCG CTGCAGGAGC TGTGTCGCCA 2280 

GCGCATCGTG GCCGCCGTGG GTCGCGAGAA CCTGGCACGC ATCCCTCTTA ACCCGGTACT 2340 

CCGTGACTAC CTGAGTTCCT TCCCCTTCCA GATCTGACCG GCTGCCGCCG TGCCCGCAGA 2400 

ATTAAGTGGG AGCGCCTTAT TATTTCTTAT TATTAATTAT TATTATTTTT CTGGAACCAC 2460 

GTGGGAGCCC TCCCCGCCTA GGTCGGAGGG AGTGGGTGTG GAGGGTGAGA TCCCTCCCAC 2520 

TTCTGGCTGG AG ACC TTATC CCGCCTCTCG GGGGGCCTCC CCTCCTGGTG CTCCCTCCCG 2580 

GTCCCCCTGG TTGTAGCAGC TTGTGTCTGG GGCCAGGACC TG AAC TC CAC GCCTACCTCT 2640 

CCATGTTTAC ATGTTCCCAG TATCTTTGCA CAAACCAGGG GTGGGGGAGG GTCTCTGGCT 2700 

TCATTTTTCT GCTGTGCAGA ATATTCTATT TTATATTTTT ACATCCAGTT TAGATAATAA 2760 

ACTTTATTAT GAAAGTTTTT TTTTTTAAAG AAACAAAGAT TTCTAGA 2 807 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 212 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Val Ala Arg Asn Gin Val Glu Ala Asp Asn Ala lie Ser Pro Ala 
15 10 15 

Ser Glu Pro Arg Arg Arg Pro Glu Pro Ser Ser Ser Ser Ser Ser Ser 
20 25 30 

Ser Pro Ala Ala Pro Ala Arg Pro Arg Pro Cys Pro Val Val Pro Ala 
35 40 45 

Pro Ala Pro Gly Asp Thr His Phe Arg Thr Phe Arg Ser His Ser Asp 
50 55 60 

Tyr Arg Arg lie Thr Arg Thr Ser Ala Leu Leu Asp Ala Cys Gly Phe 
65 70 75 80 

Tyr Trp Gly Pro Leu Ser Val His Gly Ala His Glu Arg Leu Arg Ser 
85 90 95 

Glu Pro Val Gly Thr Phe Leu Val Arg Asp Ser Arg Gin Arg Asn Cys 
100 105 110 

Phe Phe Ala Leu Ser Val Lys Met Ala Ser Gly Pro Thr Ser lie Arg 
115 120 125 

Val His Phe Gin Ala Gly Arg Phe His Leu Asp Gly Asn Arg Glu Thr 
130 135 140 

Phe Asp Cys Leu Phe Glu Leu Leu Glu His Tyr Val Ala Ala Pro Arg 
145 150 155 160 

Arg Met Leu Gly Ala' Pro Leu Arg Gin Arg Arg Val Arg Pro Leu Gin 
165 170 175 

Glu Leu Cys Arg Gin Arg lie Val Ala Ala Val Gly Arg Glu Asn Leu 
180 185 190 
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Ala Arg lie Pro Leu Asn Pro Val Leu Arg Asp Tyr Leu Ser Ser Phe 
195 200 205 



Pro Phe Gin lie 
210 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1611 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A> NAME/KEY: CDS 

(B) LOCATION: 263.. 1529 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

CGAATTCCGG GCGGGCTGTG TGAGTCTGTG AGTGGAAGGC GCGCCGGCTC TTTTGTCTGA 60 

GTGTGACCCG GTGGCTTTGT TCCAGGCATT CCGGTGATTT CCTCCGGGCA GTCCGCAGAA 120 

GCCGCAGCGG CCGCCCGCGC TCTCTCTGCA GTCTCCACAC CCGGGAGAGC CTGAGCCCGC 180 

GTCACGCCCC TCAGCCCCCG CTGAGTCCCT TCTCTGTTGT CGCGTCCGAA TCGAGTTCCC 240 

GGAATCAGAC GGTGCCCCAT AG ATG GCC AGC TTT CCC CCG AGG GTT AAC GAG 292 



Met Ala 



Ser Phe Pro Pro Arg Val Asn Glu 
5 10 



1 



AAA GAG ATC GTG AGA TCA CGT ACT ATA GGG GAA CTC TTG GCT CCA GCA 
Lys Glu lie Val Arg Ser Arg Thr lie Gly Glu Leu Leu Ala Pro Ala 
15 20 25 



340 



GCT CCT TTT GAC AAG AAA TGT GGT GGT GAG AAC TGG ACG GTT GCT TTT 
Ala Pro Phe Asp Lys Lys Cys Gly Gly Glu Asn Trp Thr Val Ala Phe 
30 35 40 



388 
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GCT CCT GAT GGT TCC TAC TTT GCG TGG TCA CAA GGA TAT CGC ATA GTG 43 6 

Ala Pro Asp Gly Ser Tyr Phe Ala Trp Ser Gin Gly Tyr Arg lie Val 
45 50 55 

AAG CTT GTC CCG TGG TCC CAG TGC CGT AAG AAC TTT CTT TTG CAT GGT 484 
Lys Leu Val Pro Trp Ser Gin Cys Arg Lys Asn Phe Leu Leu His Gly 
60 65 70 

TCC AAA AAT GTT ACC AAT TCA AGC TGT CTA AAA TTG GCA AGA CAA AAC 532 
Ser Lys Asn Val Thr Asn Ser Ser Cys Leu Lys Leu Ala Arg Gin Asn 
75 80 85 90 

AGT AAT GGT GGT CAG AAA AAC AAG CCT CCT GAG CAC GTT ATA GAC TGT 580 
Ser Asn Gly Gly Gin Lys Asn Lys Pro Pro Glu His Val lie Asp Cys 
95 100 105 

GGA GAC ATA GTC TGG AGT C^T GCT TTT GGG TCT TCA GTT CCA GAA AAA 628 
Gly Asp lie Val Trp Ser Leu Ala Phe Gly Ser Ser Val Pro Glu Lys 
110 115 120 

CAG AGT CGT TGC GTT AAT ATA GAA TGG CAT CGG TTC CGA TTT GGA CAG 676 
Gin Ser Arg Cys Val Asn lie Glu Trp His Arg Phe Arg Phe Gly Gin 
125 130 135 

GAT CAG CTA CTC CTT GCC ACA GGA TTA AAC AAT GGT CGC ATC AAA ATC 724 
Asp Gin Leu Leu Leu Ala Thr Gly Leu Asn Asn Gly Arg lie Lys He 
140 145 150 

TGG GAT GTA TAT ACA GGA AAA CTC CTC CTT AAT TTG GTA GAC CAC ATT 772 
Trp Asp Val Tyr Thr Gly Lys Leu Leu Leu Asn Leu Val Asp His He 
155 160 165 170 

GAA ATG GTT AGA GAT TTA ACT TTT GCT CCA GAT GGG AGC TTA CTC CTT 820 
Glu Met Val Arg Asp Leu Thr Phe Ala Pro Asp Gly Ser Leu Leu Leu 
175 180 185 

GTA TCA GCT TCA AGA GAC AAA ACT CTA AGA GTG TGG GAC CTG AAA GAT 868 
Val Ser Ala Ser Arg Asp Lys Thr Leu Arg Val Trp Asp Leu Lys Asp 
190 195 200 

GAT GGA AAC ATG GTG AAA GTA TTG CGG GCA CAT CAG AAT TGG GTG TAC 916 
Asp Gly Asn Met Val Lys Val Leu Arg Ala His Gin Asn Trp Val Tyr 
205 210 215 

AGT TGT GCA TTC TCT CCC GAC TGT TCT ATG CTG TGT TCA GTG GGC GCC 964 
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Ser Cys Ala Phe Ser Pro Asp Cys Ser Met Leu Cys Ser Val Gly Ala 
220 225 230 



AGT AAA GCA GTT TTC CTT TGG AAT ATG GAT AAA TAC ACC ATG ATT AGG 
Ser Lys Ala Val Phe Leu Trp Asn Met Asp Lys Tyr Thr Met lie Arg 
235 240 245 250 



1012 



AAG CTG GAA GGT CAT CAC CAT GAT GTT GTA GCT TGT GAC TTT TCT CCT 
Lys Leu Glu Gly His His His Asp Val Val Ala Cys Asp Phe Ser Pro 
255 260 265 



1060 



GAT GGA GCA TTG CTA GCT ACT GCA TCC TAT GAC ACT CGT GTG TAT GTC 
Asp Gly Ala Leu Leu Ala Thr Ala Ser Tyr Asp Thr Arg Val Tyr Val 
270 275 280 



1108 



TGG GAT CCA CAC AAT GGA GAC CTT CTG ATG GAG TTT GGG CAC CTG TTT 
Trp Asp Pro His Asn Gly Asp Leu Leu Met Glu Phe Gly His Leu Phe 
285 290 295 



1156 



CCC TCG CCC ACT CCA ATA TTT GCT GGA GGA GCA AAT GAC CGA TGG GTG 
Pro Ser Pro Thr Pro He Phe Ala Gly Gly Ala Asn Asp Arg Trp Val 
300 305 310 



1204 



AGA GCT GTG TCT TTC AGT CAT GAT GGA CTG CAT GTT GCC AGC CTT GCT 
Arg Ala Val Ser Phe Ser His Asp Gly Leu His Val Ala Ser Leu Ala 
315 320 325 330 



1252 



GAT GAT AAA ATG GTG AGG TTC TGG AGA ATC GAT GAG GAT TGT CCG GTA 
Asp Asp Lys Met Val Arg Phe Trp Arg He Asp Glu Asp Cys Pro Val 
335 340 345 



1300 



CAA GTT GCA CCT TTG AGC AAT GGT CTT TGC TGT GCC TTT TCT ACT GAT 
Gin Val Ala Pro Leu Ser Asn Gly Leu Cys Cys Ala Phe Ser Thr Asp 
350 355 360 



1348 



GGC AGT GTT TTA GCT GCT GGG ACA CAT GAT GGA AGT GTG TAT TTT TGG 
Gly Ser Val Leu Ala Ala Gly Thr His Asp Gly Ser Val Tyr Phe Trp 
365 370 375 



1396 



GCC ACT CCA AGG CAA GTC CCT AGC CTT CAA CAT ATA TGT CGC ATG TCA 
Ala Thr Pro Arg Gin Val Pro Ser Leu Gin His He Cys Arg Met Ser 
380 385 390 



1444 



ATC CGA AGA GTG ATG TCC ACC CAA GAA GTC CAA AAA CTG CCT GTT CCT 
He Arg Arg Val Met Ser Thr Gin Glu Val Gin Lys Leu Pro Val Pro 



1492 
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395 400 405 410 

TCC AAA ATA TTG GCG TTT CTC TCC TAC CGC GGT TAG A CTGAAGACTG 1539 
Ser Lys lie Leu Ala Phe Leu Ser Tyr Arg Gly * 
415 420 

CCTTTCCTGG TAGGCCTGCC AGACAGAGCG CCCTTTACAA GACACACCTC AAGCTTTACC 1599 

TCGTGCCGAA TT 1611 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 422 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Ala Ser Phe Pro Pro Arg Val Asn Glu Lys Glu lie Val Arg Ser 
1 5 10 15 

Arg Thr He Gly Glu Leu Leu Ala Pro Ala Ala Pro Phe Asp Lys Lys 
20 25 30 

Cys Gly Gly Glu Asn Trp Thr Val Ala Phe Ala Pro Asp Gly Ser Tyr 
35 40 45 

Phe Ala Trp Ser Gin Gly Tyr Arg He Val Lys Leu Val Pro Trp Ser 
50 55 60 

Gin Cys Arg Lys Asn Phe Leu Leu His Gly Ser Lys Asn Val Thr Asn 
65 70 75 80 

Ser Ser Cys Leu Lys Leu Ala. Arg Gin Asn Ser Asn Gly Gly Gin Lys 
85 90 95 

Asn Lys Pro Pro Glu His Val He Asp Cys Gly Asp He Val Trp Ser 
100 105 110 

Leu Ala Phe Gly Ser Ser Val Pro Glu Lys Gin Ser Arg Cys Val Asn 
115 120 125 



SUBSTITUTE SHEET (RULE 26) 



09:56:51 



WO 98/20023 PCT/AU97/00729 



124- 



Ile Glu Trp His Arg Phe Arg Phe Gly Gin Asp Gin Leu Leu Leu Ala 
130 135 140 

Thr Gly Leu Asn Asn Gly Arg lie Lys He Trp Asp Val Tyr Thr Gly 
145 150 155 160 

Lys Leu Leu Leu Asn Leu Val Asp His lie Glu Met Val Arg Asp Leu 
165 170 175 

Thr Phe Ala Pro Asp Gly Ser Leu Leu Leu Val Ser Ala Ser Arg Asp 
180 185 190 

Lys Thr Leu Arg Val Trp Asp Leu Lys Asp Asp Gly Asn Met Val Lys 
195 200 205 

Val Leu Arg Ala His Gin Asn Trp Val Tyr Ser Cys Ala Phe Ser Pro 
210 215 220 

Asp Cys Ser Met Leu Cys Ser Val Gly Ala Ser Lys Ala Val Phe Leu 
225 230 235 240 

Trp Asn Met Asp Lys Tyr Thr Met He Arg Lys Leu Glu Gly His His 
245 250 255 

His Asp Val Val Ala Cys Asp Phe Ser Pro Asp Gly Ala Leu Leu Ala 
260 265 270 

Thr Ala Ser Tyr Asp Thr Arg Val Tyr Val Trp Asp Pro His Asn Gly 
275 280 285 

AsprJLeu Leu Met Glu Phe Gly His Leu Phe Pro Ser Pro Thr Pro He 
290 295 300 

Phe Ala Gly Gly Ala Asn Asp Arg Trp Val Arg Ala Val Ser Phe Ser 
305 310 315 320 

His Asp Gly Leu His Val Ala Ser Leu Ala Asp Asp Lys Met Val Arg 
325 330 335 

Phe Trp Arg He Asp Glu Asp Cys Pro Val Gin Val Ala Pro Leu Ser 
340 345 350 

Asn Gly Leu Cys Cys Ala Phe Ser Thr Asp Gly Ser Val Leu Ala Ala 
355 360 365 



SUBSTITUTE SHEET (Rul e 26) 



. 09:56:51 





WO 98)70023 



PCT/AU97/00729 



- 125- 



Gly Thr His Asp Gly Ser Val Tyr Phe Trp Ala Thr Pro Arg Gin Val 
370 375 380 



Pro Ser Leu Gin His He Cys Arg Met Ser He Arg Arg Val Met Ser 
385 390 395 400 



Thr Gin Glu Val Gin Lys Leu Pro Val Pro Ser Lys He Leu Ala Phe 
405 410 415 



Leu Ser Tyr Arg Gly * 
420 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 783 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

CTGTCTTCCT CCGCAGCGCG AGGCTGGGTA CAGGGTCTAT TGTCTGTGGT TGACTCCGTA 60 

CTTTGGTCTG AGGCCTTCGG GAGCTTTCCC GAGGCAGTTA GCAGAAGCCG CAGCGACCGC 120 

CCCCGCCCGT CTCCTCTGTC CCTGGGCCCG GGAGACAAAC TTGGCGTCAC GCCCTCAGCG 180 

GTCGCCACTC TCTTCTCTGT TGTTGGGTCC GCATCGTATT CCCGGAATCA GACGGTGCCC 240 

CATAGATGGC CAGCTTTCCC CCGAGGGTCA ACGAGAAAGA GATCGTGAGA TCACGTACTA 300 

TAGGTGAACT TTTAGCTCCT GCAGCTCCTT TTGACAAGAA ATGT6GTCGT GAAAATTGGA 360 

CTGTTGCTTT TGCTCCAGAT GGTTCATACT TTGCTTGGTC ACAAGGACAT CGCACAGTAA 420 

AGCTTGTTCC GTGGTCCCAG TGCCTTCAGA ACTTTCTCTT GCATGGCACC AAGAATGTTA 480 

CCAATTCAAG CAGTTTAAGA TTGCCAAGAC AAAATAGTGA TGGTGGTCAG AAAAATAAGC 540 

CTCGTGACAT ATTATAGACT GTGGAGATAT AGTCTGGAGT CTTGCTTTTG GGTCATCAGT 600 
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TCCAGAAAAA CAGAGTCGCT- GTGTAAATAT AGAATGGCAT CGCTTCAGAT TTGGACAAGA 
TCAGCTACTT CTTGC TAC AG GGTTGAACAA TGGGCGTATC AAAATATGGG ATGTATATC A 
GGAAACTCCT C CTTAAC TTG GTAGATCATA CTGAAGTGGT CAGAGATTTA ACTTTTGCTC 
CAG 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1122 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
CTC TGTATGT CTGAATGAAG CTATAACATT TGCCTTTTTA TTGCAGGTTT TCCTTTGGAA 
TATGGATAAA TACACCATGA TACGGAAACT AGAAGGACAT CACCATGATG TGGTAGCTTG 
TGACTTTTCT CCTGATGGAG CATTACTGGC TACTGCATCT TATGATACTC GAGTATATAT 
CTGGGATCCA CATAATGGAG ACATTCTGAT GGAATTTGGG CACCTGTTTC CCCCACCTAC 
TCCAATATTT GCTGGAGGAG CAAATGACCG GTGGGTACGA TCTGTATCTT TTAGCCATGA 
TGGACTGCAT GTTGCAAGCC TTGCTGATGA TAAAATGGTG AGGTTCTGGA GAATTGATGA 
GGATTATCCA GTGCAAGTTG CACCTTTGAG CAATGGTCTT TGCTGTGCCT TCTCTACTGA 
TGGCAGTGTT TTAGCTGCTG GGACACATGA CGGAAGTGTG TATTTTTGGG CCACTCCACG 
GCAGGTCCCT AGCCTGCAAC ATTTATGTCG CATGTCAATC CGAAGAGTGA TGCCCACCCA 
AGAAGTTCAG GAGCTGCCGA TTCCTTCCAA GCTTTTGGAG TTTCTCTCGT ATCGTATTTA 
GAAGATTCTG CCTTCCCTAG TAGTAGGGAC TGACAGAATA CACTTAACAC AAAC CTCAAG 
CTTTACTGAC TTCAATTATC TGTTTTTAAA GACGTAGAAG ATTTATTTAA TTTGATATGT 



783 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
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TCTTGTACTG CATTTTGATC AGTTGAGCTT TTAAAATATT ATTTATAGAC AATAGAAGTA 



780 



TTTCTGAACA TATCAAATAT AAATTTTTTT AAAGATCTAA CTGTGAAAAC ATACATACCT 



640 



GTAC ATATTT AGATATAAGC TGCTATATGT TGAATGGACC CTTTTGCTTT TCTGATTTTT 



900 



AGTTCTGACA TGTATATATT GCTTCAGTAG AGCCACAATA TGTATCTTTG CTGTAAAGTG 



960 



CAAGGAAATT TTAAATTCTG GGACACTGAG TTAGATGGTA AATACTGACT TACGAAAGTT 



1020 



GAATTGGGTG AGGCGGGCAA ATCACCTGAG GTCAGCAGTT TGAGACTAGC CTGGCAAACA 



1080 



TGATGAAACC CTGTCTCTAC TAAAAATACA AAAAAAAAAA AA 



1122 



<2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2537 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 422.. 2029 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

CGGCACGAGC CGGGCTCCGT CCGGAGGAAG CGAGGCTGCG CCGCCGGCCC GGCAGGAGCG 60 

GAGGACGGGA GCGCGGGCGG TCGCGCTCGC CCTGTCGCTG ACTGCGCTGC CCCGGCCCAT 120 

CCTTGCCTGG CCGCAGGTGC C CTGGATG AG GCCGCCGCGC GTGTCCCGGC CGCTGAGTGT 180 

CCCCCGCGGT CGCCCGGCGC CTGCCCTCAA GCGGCCGCCT CTCCTTGCCC GGGTCCCCGT 240 

TTTCCCCCGG CGCAGTCCTC CTCCGGTGGG CGCCTCCGCA CCTCGGCGCA GGCGGCACGG 300 

CCCTCGGGCC GGGATGGATC CGCCGGGAAG AGGAAGACAA GCCGGGGCGT TGAGCCCCTG 360 
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CGCACGGTGC CGCCGCGCGT AGTGGGAGCT TACTCGCAGT AGGCTCTCGC TCTTCTAATC 420 

A ATG GAT AAA GTG GGG AAA ATG TGG AAC AAC TTA AAA TAC AGA TGC 466 
Met Asp Lys Val Gly Lys Met Trp Asn Asn Leu Lys Tyr Arg Cys 
1 5 10 is 

CAG AAT CTC TTC AGC CAC GAG GGA GGA AGC CGT AAT GAG AAC GTG GAG 514 
Gin Asn Leu Phe Ser His Glu Gly Gly Ser Arg Asn Glu Asn Val Glu 
20 25 . 30 

ATG AAC CCC AAC AGA TGT CCG TCT GTC AAA GAG AAA AGC ATC AGT CTG 562 
Met Asn Pro Asn Arg Cys Pro Ser Val Lys Glu Lys Ser He Ser Leu 
35 40 45 

GGA GAG GCA GCT CCC CAG CAA GAG AGC AGT CCC TTA AGA GAA AAT GTT 610 
Gly Glu Ala Ala Pro Gin Gin Glu Ser Ser Pro Leu Arg Glu Asn Val 
50 55 60 

GCC TTA CAG CTG GGA CTG AGC CCT TCC AAG ACC TTT TCC AGG CGG AAC 658 
Ala Leu Gin Leu Gly Leu Ser Pro Ser Lys Thr Phe Ser Arg Arg Asn 
65 70 75 

CAA AAC TGT GCC GCA GAG ATC CCT CAA GTG GTT GAA ATC AGC ATC GAG 70 6 

Gin Asn Cys Ala Ala Glu lie Pro Gin Val Val Glu He Ser He Glu 
80 85 90 95 

AAA GAC AGT GAC TCG GGT GCC ACC CCA GGA ACG AGG CTT GCA CGG AGA 754 
Lys Asp Ser Asp Ser Gly Ala Thr Pro Gly Thr Arg Leu Ala Arg Arg 
100 105 no 



GAC TCC TAC TCG CGG CAC GCC CCG TGG GGA GGA AAG AAG AAA CAT TCC 
Asp Ser Tyr Ser Arg His Ala Pro Trp Gly Gly Lys Lys Lys His Ser 
115 • 120 125 



AGA ACT CGA AGC GGC CTT CAG AGG CGA GAG CGG CGC TAT GGA GTC AGC 
Arg Thr Arg Ser Gly Leu Gin Arg Arg Glu Arg Arg Tyr Gly Val Ser 
145 150 155 



802 



TGT TCC ACA AAG ACC CAG AGT TCA TTG GAT ACC GAG AAA AAG TTT GGT 850 
Cys Ser Thr Lys Thr Gin Ser Ser Leu Asp Thr Glu Lys Lys Phe Gly 
130 135 140 



898 



TCC ATG CAG GAC ATG GAC AGC GTT TCT AGC CGC GCG GTC GGG AGC CGC 946 
Ser Met Gin Asp Met Asp Ser Val Ser Ser Arg Ala Val Gly Ser Arg 
160 165 170 175 
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TCC CTG AGG CAG AGG CTC CAG GAC ACG GTG GGT TTG TGT TTT CCC ATG 
Ser Leu Arg Gin Arg Leu Gin Asp Thr Val Gly Leu Cys Phe Pro Met 
180 185 190 



994 



AGA ACT TAC AGC AAG CAG TCA AAG CCA CTC TTT TCC AAT AAA AGA AAA 
Arg Thr Tyr Ser Lys Gin Ser Lys Pro Leu Phe Ser Asn Lys Arg Lys 
195 200 205 



1042 



ATA CAT CTT TCT GAA TTA ATG CTG GAG AAA TGC CCT TTT CCT GCT GGC 
lie His Leu Ser Glu Leu Met Leu Glu Lys Cys Pro Phe Pro Ala Gly 
210 215 220 



1090 



TCG GAT TTA GCA CAA AAG TGG CAT TTG ATT AAA CAG CAT ACC GCC CCT 
Ser Asp Leu Ala Gin Lys Trp His Leu lie Lys Gin His Thr Ala Pro 
225 230 235 



1138 



GTG AGC CCA CAC TCA ACA TTT TTT GAT ACA TTT GAT CCA TCA CTG GTG 
Val Ser Pro His Ser Thr Phe Phe Asp Thr Phe Asp Pro Ser Leu Val 
240 245 250 255 



1186 



TCT ACA GAA GAT GAA GAA GAT AGG CTT CGC GAG AGA AGA CGG CTT AGT 
Ser Thr Glu Asp Glu Glu Asp Arg Leu Arg Glu Arg Arg Arg Leu Ser 
260 265 270 



1234 



ATC GAA GAA GGG GTG GAT CCC CCT CCC AAC GCA CAA ATA CAC ACC TTT 
lie Glu Glu Gly Val Asp Pro Pro Pro Asn Ala Gin lie His Thr Phe 
275 280 285 



1282 



GAA GCT ACT GCA CAG GTC AAC CCA TTG TAT AAG CTG GGA CCA AAG TTA 
Glu Ala Thr Ala Gin Val Asn Pro Leu Tyr Lys Leu Gly Pro Lys Leu 
290 295 300 



1330 



GCT CCT GGG ATG ACA GAG ATA AGT GGA GAT GGT TCT GCA ATT CCA CAA 
Ala Pro Gly Met Thr Glu He Ser Gly Asp Gly Ser Ala He Pro Gin 
305 310 315 



1378 



GCA ATT GTG ACT CAG AAG AGG ATT CAA CCA CCC TAT GTC TGC AGT CAC 
Ala He Val Thr Gin Lys Arg He Gin Pro Pro Tyr Val Cys Ser His 
320 325 330 335 



1426 



GGA GGC AGA AGC AGC GCC AGG TGT CCG GGG ACA GCC ACG CGC ACG TTA 
Gly Gly Arg Ser Ser Ala Arg Cys Pro Gly Thr Ala Thr Arg Thr Leu 
340 345 350 



1474 



GCA GAC AGG GAG CTT GGA AAG TTC ATA CGC AGA TCG ATT ACA TAC ACT 
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Ala Asp Arg Glu Leu Gly Lys Phe He Arg Arg Ser He Thr Tyr Thr 
355 360 365 



GCC TCG TGC CAG ATT TGC TTC AGA TCA CAG GGA ATC CCT GTT ACT GGG 
Ala Ser Cys Gin lie Cys Phe Arg Ser Gin Gly He Pro Val Thr Gly 
370 375 380 



1570 



GCG TGA TGG ACC GAT ACG AGG CCG AAG CCC TTC TAG AAG GGA AAC CGG 
Ala * Trp Thr Asp Thr Arg Pro Lys Pro Phe * Lys Gly Asn Arg 
385 390 395 



1618 



AAG GCA CGT TCT TGC TCA GGG ACT CTG CAC AGG AGG ACT ACC TCT TCT 
Lys Ala Arg Ser Cys Ser Gly Thr Leu His Arg Arg Thr Thr Ser Ser 
400 405 410 415 



1666 



CTG TGA GCT TCC GCC GCT ACA ACA GGT CTC TGC ACG CCC GGA TCG AGC 
Leu * Ala Ser Ala Ala Thr Thr Gly Leu Cys Thr Pro Gly Ser Ser 
420 425 430 



1714 



AGT GGA ACC ACA ACT TCA GCT TCG ATG CCC ATG ACC CCT GCG TGT TTC 
Ser Gly Thr Thr Thr Ser Ala Ser Met Pro Met Thr Pro Ala Cys Phe 
435 440 445 



1762 



ACT CCT CCA CGT CAC GGG GCT TCT CGA ACA CTA TAA AGA CCC CAG CTC 
Thr Pro Pro Arg His Gly Ala Ser Arg Thr Leu * Arg Pro Gin Leu 
450 455 460 



1810 



TTG CAT GTT TTT TGA ACC GTT GCT AAC GAT ATC ACT GAA TAG AAC TTT 
Leu His Val Phe * Thr Val Ala Asn Asp He Thr Glu * Asn Phe 
465 470 475 



1858 



CCC TTT CAG CCT GCA GTA TAT CTG CCG CGC AGT GAT CTG CAG ATG CAC 
Pro Phe Gin Pro Ala Val Tyr Leu Pro Arg Ser Asp Leu Gin Met His 
480 485 490 495 



1906 



TAC GTA TGA TGG GAT TGA CGG GCT CCC GCT ACC GTC GAT GTT ACA GGA 
Tyr Val * Trp Asp * Arg Ala Pro Ala Thr Val Asp Val Thr Gly 
500 505 510 



1954 



TTT TTT AAA AGA GTA TCA TTA TAA ACA AAA AGT TAG GGT TCG CTG GTT 
Phe Phe Lys Arg Val Ser Leu * Thr Lys Ser * Gly Ser Leu Val 
515 520 525 



2002 



AGA ACG AGA CCA GTC AAA GCA AAG T AAC TCC TG T CCCCAAAGGG CAC TAACTAA 
Arg Thr Arg Pro Val Lys Ala Lys 
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530 535 

GTCTGCTCCT CCCGTGCATC GAACTGCACC CATAGGAGGC AGTCAGCTGC TAGGATTTCC 2116 

CACCCAGAAT GGGAGCTTAG TCATTAGCCT CTGCCCTATG GGGTCCGCTG TTCCTCAGAC 2176 

AAAGGTGCCT AGGGACAGCA AGATGGCTTG CAGGTGTTCG GTGGGCTGTG ACAACTGAGG 2236 

GAGGCAACTC TGGGGCATTT GCTATGAAGA ATTCTATTTC TTACCGAAGA ACAAATTATT 2296 

AATATTGGAT GGGTATTTCA ATAGTGTGAC TAATGTTTGA AATTATTTTT TCTAAGAATT 2356 

TTTCTATAAC CTTCAGAAAA AGTAGTGATG TTTGTAGTTA CTATAAATCA AGCTTTGAAA 2416 

GTTCAAAACA AACAAGTTAA ATAAAAGACT ACCTTCCTTT TAGAGAAAAC AAATGCAAGT 2476 

TTTCCCAGCC ACAGGCATTG TGCACTGTTA ATGTTGCTTG TTATCAGCTC CTTTCTCCTC 253 6 

C 2537 

(2) INFORMATION FOR SEQ ID NOrl8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH r 535 amino acids 

(B) TYPE: amino acid 
<D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Met Asp Lys Val Gly Lys Met Trp Asn Asn Leu Lys Tyr Arg Cys Gin 
15 10 15 

Asn Leu Phe Ser His Glu Gly Gly Ser Arg Asn Glu Asn Val Glu Met 
20 25 30 

Asn Pro Asn Arg Cys Pro Ser Val Lys Glu Lys Ser lie Ser Leu Gly 
35 40 45 

Glu Ala Ala Pro Gin Gin Glu Ser Ser Pro Leu Arg Glu Asn Val Ala 
50 55 60 

Leu Gin Leu Gly Leu Ser Pro Ser Lys Thr Phe Ser Arg Arg Asn Gin 
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65 70 75 80 

Asn Cys Ala Ala Glu lie Pro Gin Val Val Glu He Ser He Glu Lys 
85 90 95 

Asp Ser Asp Ser Gly Ala Thr Pro Gly Thr Arg Leu Ala Arg Arg Asp 
100 105 110 

Ser Tyr Ser Arg His Ala Pro Trp Gly Gly Lys Lys Lys His Ser Cys 
115 120 125 

Ser Thr Lys Thr Gin Ser Ser Leu Asp Thr Glu Lys Lys Phe Gly Arg 
130 135 140 

Thr Arg Ser Gly Leu Gin Arg Arg Glu Arg Arg Tyr Gly Val Ser Ser 
145 150 155 160 

Met Gin Asp Met Asp Ser Val Ser Ser Arg Ala Val Gly Ser Arg Ser 
165 170 175 

Leu Arg Gin Arg Leu Gin Asp Thr Val Gly Leu Cys Phe Pro Met Arg 
180 185 190 

Thr Tyr Ser Lys Gin Ser Lys Pro Leu Phe Ser Asn Lys Arg Lys lie 
195 ^ 200 205 

His Leu Ser Glu Leu Met Leu Glu Lys Cys Pro Phe Pro Ala Gly Ser 
210 215 220 

Asp ieu Ala Gin Lys Trp His Leu He Lys Gin His Thr Ala Pro Val 
225 230 235 240 

Ser Pro His Ser Thr Phe Phe Asp Thr Phe Asp Pro Ser Leu Val Ser 
245 250 255 

Thr Glu Asp Glu Glu Asp Arg Leu Arg Glu Arg Arg Arg Leu Ser He 
260 265 270 

Glu Glu Gly Val Asp Pro Pro Pro Asn Ala Gin lie His Thr Phe Glu 
275 280 285 

Ala Thr Ala Gin Val Asn Pro Leu Tyr Lys Leu Gly Pro Lys Leu Ala 
290 295 300 

Pro Gly Met Thr Glu He Ser Gly Asp Gly Ser Ala He Pro Gin Ala 
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305 310 315 320 

lie Val Thr Gin Lys Arg He Gin Pro Pro Tyr Val Cys Ser His Gly 
325 330 335 

Gly Arg Ser Ser Ala Arg Cys Pro Gly Thr Ala Thr Arg Thr Leu Ala 
340 345 350 

Asp Arg Glu Leu Gly Lys Phe He Arg Arg Ser He Thr Tyr Thr Ala 
355 360 365 

Ser Cys Gin lie Cys Phe Arg Ser Gin Gly He Pro Val Thr Gly Ala 
370 375 380 

* Trp Thr Asp Thr Arg Pro Lys Pro Phe * Lys Gly Asn Arg Lys 
385 390 395 400 

Ala Arg Ser Cys Ser Gly Thr Leu His Arg Arg Thr Thr Ser Ser Leu 
405 410 415 

* Ala Ser Ala Ala Thr Thr Gly Leu Cys Thr Pro Gly Ser Ser Ser 

420 425 430 

Gly Thr Thr Thr Ser Ala Ser Met Pro Met Thr Pro Ala Cys Phe Thr 
435 440 445 

Pro Pro Arg His Gly Ala Ser Arg Thr Leu * Arg Pro Gin Leu Leu 
450 455 460, 

His Val Phe * Thr Val Ala Asn Asp He Thr Glu * Asn Phe Pro 
465 470 475 480 

Phe Gin Pro Ala Val Tyr Leu Pro Arg Ser Asp Leu Gin Met His Tyr 
485 490 495 

Val * Trp Asp * Arg Ala Pro Ala Thr Val Asp Val Thr Gly Phe 
500 505 510 

Phe Lys Arg Val Ser Leu * Thr Lys Ser * Gly Ser Leu Val Arg 
515 520 525 

Thr Arg Pro Val Lys Ala Lys 
530 535 
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(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1221 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GATTAAACAG CATACAGCTC CTGTGAGCCC ACATTCAACA TTTTTTGATA CTTTGATCCA 
TCTTTGGTTT CTACAGAAGA TGAAGAAGAT AGGCTTAGAG AGAGAAGGCG GCTTAGTATT 
GAAGAAGGGG TTGATCCCCC TC CCAATGC A CAAATACATA CATTTGAAGC TACTGCACAG 
GTTAATCCAT TATTAAACTG GGACCAAAAT TAGCTCCTGG AATGACTGAA ATAAGTGGGG 
ACAGTTCTGC AATTCCACAA GCTAATTGTG AC TCGGAAG A GGATACAACC ACCC TGTGTT 
GCAGTCACGG AGGCAGAAGC AGCGTCAGAT ATCTGGAGAC AGCCATACCC ATGTTAGCAG 
ACAGGGAGCT TGGAAAGTCC ACACACAGAT TGATTACATA CACTGCTTCG TGCCTGATTT 
GCTTCAAATT ACAGGGAATC CCTGTTACTG GGGAGTGATG GACCGTTATG AAGCAGAAGC 
CCTTCTCGAA GGGAAACCTG AAGGCACGTT TTTGCTCAGG GACTCTGCGC AAG AGG AC T A 
CTTCTTCTCT GTGAGCTTCC GCCGATACAA CAGATCCCTG CATGCC CGAA TTGAGCAGTG 
GAATCACAAC TTTAGTTTCG ACGCCCATGA CCCGTGTGTA TTTCACTCCT CCACTGTAAC 
GGGACTTTTA GAACATTATA AAGATCCCAG TTCGTGCATG TTTTTTGAAC CATTGCTTAC 
TATATCACTA AATAGGACTT TCCCTTTTAG CCTGCAGTAT ATCTGTCGCG CGGTAATCTG 
CAGGTGCACT ACGTATGATG GAATTGATGG GCTCCCTCTA CCCTCAATGT TACAGGATTT 
TTTAAAAGAG TATCATTATA AACAAAAAGT TAGAGTTCGC TGGTTGGAAC GAGAACCAGT 
CAAGGCAAAG TAAACTCTCC GGTCCCC AAA GGGTGTTAAC TAGGTCCGCT TTCATGTGCA 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
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TCAGACAGTA CACCTATAGC AAGCACACGT AGCAGTGTTA GGCTTTTTCA TACAGTATGT 1020 

AAGCTTAGTG TTAGTATCTG TCAGATGCTA CCTGCTGTTA CTTATTCAGA TAAACATGGT 1080 

GCCTATTGGA ACAATAGCGG ATAGAGCTAC AGGTGTTCAG TAAGAC TAC A AAAACATTTT 1140 

GCCTATTTCG CTAACAGTTT GGTTTTTAAT GGCTGTGGTA TTTGAGTGAG GCAACTCTGG 1200 

GGCATTTGTT ATGAAGAAAT G 1221 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2369 base pairs 

(B) TYPE i nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 116.. 1330 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

GGCACGAGGC GGTGGTGGCG GCGGCGGGCG CGGCCGCGGC GGGGCGGGCG CGGAATGAAG 60 

GCCCACGGCC CTGGGGGCTG AGGCGCCCGC CGCCTGGGGC GGGCCGCGCG TCCTC ATG 118 

Met 
1 

GAG GCC GGA GAG GAG CCG CTG CTG CTG GCT GAA CTC AAG CCT GGG CGC 166 
Glu Ala Gly Glu Glu Pro Leu Leu Leu Ala Glu Leu Lys Pro Gly Arg 
5 10 15 

CCC CAC CAG TTC GAC TGG AAG TCA AGC TGC GAG ACC TGG AGC GTG GCC 214 
Pro His Gin Phe Asp Trp Lys Ser Ser Cys Glu Thr Trp Ser Val Ala 
20 25 30 

TTC TCG CCA GAC GGT TCC TGG TTC GCC TGG TCT CAA GGA CAC TGC GTG 262 
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Phe Ser Pro Asp Gly Ser Trp Phe Ala Trp Ser Gin Gly His Cys Val 
35 40 45 



GTC AAG CTG GTC CCC TGG CCC TTA GAG GAA GAG TTC ATC CCT AAA GGA 310 
Val Lys Leu Val Pro Trp Pro Leu Glu Glu Gin Phe lie Pro Lys Gly 
50 55 60 65 

TTC GAA GCC AAG AGC CGA AGC AGC AAG AAT GAC CCA AAA GGA CGG GGC 358 
Phe Glu Ala Lys Ser Arg Ser Ser Lys Asn Asp Pro Lys Gly Arg Gly 
70 75 80 

AGT; CTG AAG GAG AAG ACG CTG GAC TGT GGC CAG ATT GTG TGG GGG CTG 406 
Ser Leu Lys Glu Lys Thr Leu Asp Cys Gly Gin lie Val Trp Gly Leu 
85 90 95 

GCC TTC AGC CCG TGG CCC TCT CCA CCC AGC AGG AAA CTC TGG GCA CGT 454 
Ala Phe Ser Pro Trp Pro Ser Pro Pro Ser Arg Lys Leu Trp Ala Arg 
100 105 110 

CAC CAT CCC CAG GCG CCT GAT GTT TCT TGC CTG ATC CTG GCC ACA GGT 502 
His His Pro Gin Ala Pro Asp Val Ser Cys Leu lie Leu Ala Thr Gly 
115 120 125 

CTC AAC GAT GGG CAG ATC AAG ATT TGG GAG GTA CAG ACA GGC CTC CTG 550 
Leu Asn Asp Gly Gin lie Lys lie Trp Glu Val Gin Thr Gly Leu Leu 
130 135 140 145 

CTT CTG AAT CTT TCT GGC CAC CAA GAC GTC GTG AGA GAT CTG AGC TTC 598 
Leu Leu Asn Leu Ser Gly His Gin Asp Val Val Arg Asp Leu Ser Phe 
150 155 160 

ACG. CCC AGC GGC AGT TTG ATT TTG GTC TCT GCA TCC CGG GAT AAG ACA 646 
Thr Pro Ser Gly Ser Leu lie Leu Val Ser Ala Ser Arg Asp Lys Thr 
165 170 175 

CTT CGA ATT TGG GAC CTG AAT AAA CAC GGT AAG CAG ATC CAG GTG TTA 694 
Leu Arg lie Trp Asp Leu Asn Lys His Gly Lys Gin lie Gin Val Leu 
180 185 190 

TCC GGC CAT CTG CAG TGG GTT TAC TGC TGC TCC ATC TCC CCT GAC TGT 742 
Ser Gly His Leu Gin Trp Val Tyr Cys Cys Ser lie Ser Pro*Asp Cys 
195 200 205 



AGC ATG CTG TGC TCT GCA GCT GGG GAG AAG TCG GTC TTT CTG TGG AGC 
Ser Met Leu Cys Ser Ala Ala Gly Glu Lys Ser Val Phe Leu Trp Ser 
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210 215 220 225 

ATG CGG TCC TAC ACA CTA ATC CGG AAA CTA GAA GGC CAC CAA AGC ACT 838 
Met Arg Ser Tyr Thr Leu He Arg Lys Leu Glu Gly His Gin Ser Ser 
230 235 240 

GTT GTC TCC TGT GAT TTC TCT CCT GAT TCA GCC TTG CTT GTC ACA GCT 886 
Val Val Ser Cys Asp Phe Ser Pro Asp Ser Ala Leu Leu Val Thr Ala 
245 250 255 

TCG TAT GAC ACC ACT GTG ATT ATG TGG GAC CCC TAC ACC GGC GCG AGG 934 
Ser Tyr Asp Thr Ser Val He Met Trp Asp Pro Tyr Thr Gly Ala Arg 
260 265 270 

CTG AGG TCA CTT CAT CAC ACA CAA CTT GAA CCC ACC ATG GAT GAC AGT 982 
Leu Arg Ser Leu His His Thr Gin Leu Glu Pro Thr Met Asp Asp Ser 
275 280 285 

GAC GTC CAC ATG AGC TCC CTG AGG TCC GTG TGC TTC TCA CCT GAA GGC 1030 
Asp Val His Met Ser Ser Leu Arg Ser Val Cys Phe Ser Pro Glu Gly 
290 295 300 305 

TTG TAT CTC GCT ACG GTG GCA GAT GAC AGG CTG CTC AGG ATC TGG GCT 1078 
Leu Tyr Leu Ala Thr Val Ala Asp Asp Arg Leu Leu Arg He Trp Ala 
310 315 320 

CTG GAA CTG AAG GCT CCG GTT GCC TTT GCT CCG ATG ACC AAT GGT CTT 1126 
Leu Glu Leu Lys Ala Pro Val Ala Phe Ala Pro Met Thr Asn Gly Leu 
325 330 335 

TGC TGC ACG TTC TTC CCA CAC GGT GGA ATT ATT GCC ACA GGG ACG AGA 1174 
Cys Cys Thr Phe Phe Pro His Gly Gly lie lie Ala Thr Gly Thr Arg 
340 345 350 

GAT GGC CAT GTC CAG TTC TGG ACA GCT CCC CGG GTC CTG TCC TCA CTG 1222 
Asp Gly His Val Gin Phe Trp Thr Ala Pro Arg Val Leu Ser Ser Leu 
355 360 365 

AAG CAC TTA TGC AGG AAA GCC CTC CGA AGT TTC CTG ACA ACG TAT CAA 1270 
Lys His Leu Cys Arg Lys Ala Leu Arg Ser Phe Leu Thr Thr Tyr Gin 
370 375 380 385 

GTC CTA GCA CTG CCA ATC CCC AAG AAG ATG AAA GAG TTC CTC ACA TAC 1318 
Val Leu Ala Leu Pro He Pro Lys Lys Met Lys Glu Phe Leu Thr Tyr 
390 395 400 
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AGG ACT TTC TAGCAGTGCC GGCTCCCCCA CCTCCTGCAG CAGCAGCAGT 1367 
Arg Thr Phe 

405 

ACAAGGGACT GGCTAGGATG GAGTCAGGCA GCTCACACTG GACCAGTGTG GACCTTCCTT 1427 
CCTCCCATGG CATGTGCAAG TAGGTCTGCG TGACCCCACT TCTGTGGTGC CGGCCTTACC 1487 

TCGTCTTCAT CCGTGGTGAG CAGCCTTCGT CAGTCTAGTT GTGTTGAAGC CAAGTGCAGT 1547 

TGTGGATGTT GCTGGGGTAA TAAAGGCAAG CGGGCTCCAG AGCCTCTCTG GTGGCGGCCA 1607 

AGCCACACTC CCTTAAC TGG GAAGTACCTG CCACGTAGGG CATTTCTGCT GCCTATTTCC 1667 

AGCCAGCGGC TGCATGGTTT GAAGTTCCTC CGTTGTGGTC AGAAGAACTC TGGTGTTTGG 1727 

TTCCCTGCTC AGCTGCGCGT GG AC TGGGCT GAGCTCCTCA CCATACACTA GTGC CGGCTT 1787 

TTGTTTCCTG TAAACAGTGG TTGCATGTGT AGAGAAGTAA CAAGCGAGTA TTCAGATCAT 1847 

ACGAGGAGGC GTTCCTCGGT GCATGACGGT CAGATGGCCA TTTATCAGCA TATTTATTTG 1907 

TATTTTCTCA GCACATAGTA AGGTACAACT GTGTTTTCTC AATTGTCTCG AAAAAACAGA 1967 

GTTCTTAAGT GGCCCAGTTG TGGAGCCAAG TCTAAGTCGT GTGGAGTCAG TGCTGACATC 2 027 

ACTGGCTTGT GCTGTCTGTC ACATGTGTTT GTCTCTGCTG CTTGACC TC A TGGGATGTAC 2087 

CCTCCAGTTC AACTGCCCAA AACAGACAGC CCCTTCCAAG CACCGTTCTT TGACAGCGGT 2147 

AGCAGC TACC TATTCAAGAC GCCTCACACA AAATCTGCCT TAGAAAGTTA ATATATTTTA 2207 

AATTATTTTA AAAGAAACTC AACATCTTAT TCTTTGGCCT TTCTTAATTG ATGCTTTATG 2267 

GAGGCAGTGT TAACATTGTA CAGTGTATGC ATAGAGGAGT CTCCTCTATT TGAAGAACAA 2327 

TGCAAAATGA GGCTTTCATT GAAGGGAAAA AAAAAAAAAA AA 2 369 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 404 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY : linear 
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(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

Met Glu Ala Gly Glu Glu Pro Leu Leu Leu Ala Glu Leu Lys Pro Gly 
15 10 15 

Arg Pro His Gin Phe Asp Trp Lys Ser Ser Cys Glu Thr Trp Ser Val 
20 25 30 

Ala Phe Ser Pro Asp Gly Ser Trp Phe Ala Trp Ser Gin Gly His Cys 
35 40 45 

Val Val Lys Leu Val Pro Trp Pro Leu Glu Glu Gin Phe He Pro Lys 
50 55 60 

Gly Phe Glu Ala Lys Ser Arg Ser Ser Lys Asn Asp Pro Lys Gly Arg 
65 70 75 80 

Gly Ser Leu Lys Glu Lys Thr Leu Asp Cys Gly Gin He Val Trp Gly 
85 90 95 

Leu Ala Phe Ser Pro Trp Pro Ser Pro Pro Ser Arg Lys Leu Trp Ala 
100 105 HO 

Arg His His Pro Gin Ala Pro Asp Val Ser Cys Leu He Leu Ala Thr 
115 120 125 

Gly Leu Asn Asp Gly Gin He Lys He Trp Glu Val Gin Thr Gly Leu 
13 0 135 140 

Leu Leu Leu Asn Leu Ser Gly His Gin Asp Val Val Arg Asp Leu Ser 
145 150 155 160 

Phe Thr Pro Ser Gly Ser Leu He Leu Val Ser Ala Ser Arg Asp Lys 
165 170 175 

Thr Leu Arg He Trp Asp Leu Asn Lys His Gly Lys Gin He Gin Val 
180 185 190 

Leu Ser Gly His Leu Gin Trp Val Tyr Cys Cys Ser He Ser Pro Asp 
195 200 205 

Cys Ser Met Leu Cys Ser Ala Ala Gly Glu Lys Ser Val Phe Leu Trp 
210 215 220 
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Ser Met Arg Ser Tyr Thr Leu lie Arg Lys Leu Glu Gly His Gin Ser 
225 230 235 240 

Ser Val Val Ser Cys Asp Phe Ser Pro Asp Ser Ala Leu Leu Val Thr 
245 250 255 

Ala Ser Tyr Asp Thr Ser Val He Met Trp Asp Pro Tyr Thr Gly Ala 
260 265 270 

Arg Leu Arg Ser Leu His His Thr Gin Leu Glu Pro Thr Met Asp Asp 
275 280 285 

Ser Asp Val His Met Ser Ser Leu Arg Ser Val Cys Phe Ser Pro Glu 
290 295 300 

Gly Leu Tyr Leu Ala Thr Val Ala Asp Asp Arg Leu Leu Arg He Trp 
305 310 315 320 

Ala Leu Glu Leu Lys Ala Pro Val Ala Phe Ala Pro Met Thr Asn Gly 
325 330 335 

Leu Cys Cys Thr Phe Phe Pro His Gly Gly lie He Ala Thr Gly Thr 
340 345 350 

Arg Asp Gly His Val Gin Phe Trp Thr Ala Pro Arg Val Leu Ser Ser 
355 360 365 

Leu Lys His Leu Cys Arg Lys Ala Leu Arg Ser Phe Leu Thr Thr Tyr 
370 375 380 

Gin Val Leu Ala Leu Pro He Pro Lys Lys Met Lys Glu Phe Leu Thr 
385 390 395 400 

Tyr Arg Thr Phe 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 1246 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE : DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

GACACTGCAT CGTCAAACTG ATCCCCTGGC CGTTGGAGGA GCAGTTCATC CCTAAAGGGT 60 

TTGAAGCCAA AAGCCGAAGT AGCAAAAATG AGACGAAAGG GCGGGGCAGC CCAAAAGAGA 120 

AGACGCTGGA CTGTGGTCAG ATTGTCTGGG GGCTGGCCTT CAGCCTGTGC TTTCCCCACC 180 

CAGCAGGAAG CTCTGGGCAC GCCACCACCC CCAAGTGCCC GATGTCTCTT GCCTGGTTCT 240 

TGCTACGGGA CTCAACGATG GGCAGATCAA GATCTGGGAG GTGCAGACAG GGCTCCTGCT 300 

TTTGAATCTT TCCGGCCACC AAGATGTCGT GAGAGATCTG AGCTTCACAC CCAGTGGCAG 360 

TTTGATTTTG GTCTCCGCGT CACGGGATAA GACTCTTCGC ATCTGGGACC TGAATAAACA 420 

CGGTAAACAG ATTCAAGTGT TATCGGGCCA CCTGCAGTGG GTTTACTGCT GTTCCATCTC 480 

CCCAGACTGC AGCATGCTGT GCTCTGCAGC TGGAGAGAAG TCGGTCTTTC TATGGAGCAT 540 

GAGGTCCTAC ACGTTAATTC GGAAGCTAGA GGGCCATCAA AGCAGTGTTG TCTCTTGTGA 60 0 

CTTCTCCCCC GACTCTGCCC TGCTTGTCAC GGCTTCTTAC GATACCAATG TGATTATGTG 660 

GGACCCCTAC ACCGGCGAAA GGCTGAGGTC ACTCCACCAC ACCCAGGTTG ACCCCGCCAT 720 

GGATGACAGT GACGTCCACA TTAGC TC ACT GAGATCTGTG TGCTTCTCTC CAGAAGGCTT 780 

GTACCTTGCC ACGGTGGCAG ATGACAGACT CCTCAGGATC TGGGCCCTGG AACTGAAAAC 840 

TCCCATTGCA TTTGCTCCTA TGACCAATGG GCTTTGCTGG CACATTTTTT CCACATGGTG 900 

GAGTCATTGC CACAGGGACA AGAGATGGCC ACGTCCAGTT CTGGACAGCT CCTAGGGTCC 960 

TGTCCTCACT GAAGCACTTA TGCCGGAAAG CCCTTCGAAG TTTCCTAACA ACTTACCAAG 1020 

TCCTAGCACT GCCAATCCCC AAGAAAATGA AAGAGTTCCT CACATACAGG ACTTTTTAAG 1080 

CAACACCACA TCTTGTGCTT CTTTGTAGCA GGGTAAATCG TCCTGTCAAA GGGAGTTGCT 1140 

GGAATAATGG GCCAAACATC TGGTCTTGCA TTGAAATAGC ATTTCTTTGG GATTGTGAAT 1200 
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AGAATGTAGC AAAACCAGAT TCCAGTGTAC TAGTCATGGA TTTTTC 124 6 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 422 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

ACCATGGTTC CAAGTCCTCT CCCCTGTGGT CAAGTTGCCC GAATGTTGGG CCCAAGTGCC 60 

TTTTCCTCCT TGGGCCTCCC CTTCTGACCT GCAGGACAGT TTTCCGGAGC CCATTTGGTA 120 

TGAGGTATTA ATTAGCCTTA ACTAAATTAC AGGGGACTCA GAGGCCGTGC TCCTGACCGA 180 

TCCAGACACT ATTTTTTTTT TTTTTTTTTA ACAATGGTGT GCATGTGCAG GAAATGACAA 240 

ATTTGTATGT CAGATTATAC AAGGATGTAT TC TTAAACCG CATGACTATT CAGATGGCTA 300 

CTGAGTTATC AGTGGCCATT T ATTAG CATC ATATTTATTT GTATTTTCTC AACAGATGTT 3 60 

AAGGTACAAC TGTGTTTTTC TCGATTATCT AAAAAC C ATA GTACTTAAAT TGAAAAAAAA 420 



AA 



422 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2019 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

. (ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24; 
GGCACGAGGC GGGGTCAGGG CGGAGGCTGA GGACCAAGTA GGCATG6CGG AGGGCGGGAC 
CGGCCCCGAT GGACGGGCCG GCCCGGGACC CGCAGGTCCT AATCTGAAGG AGTGGCTGAG 
GGAGCAGTTC TGTGACCATC CACTGGAGCA CTGTGACGAT ACAAGACTCC ATGATGCAGC 
CTATGTAGGG GACCTCCAGA CCCTCAGGAA CCTACTGCAA GAGGAGAGCT ACCGGAGCCG 
CATCAATGAG AAGTCTGTCT GGTGCTGCGG CTCGCTTCCC TGCACACCAC TGAGGATCGC 
AGCCACTGCA GGCCATGGGA ACTGTGTGGA CTTCCTCATA CGCAAAGGGG CCGAGGTGGA 
CCTGGTGGAT GTCAAGGGGC AGACTGCCCT GTATGTGGCT GTAGTGAACG GGCACTTGGA 
GAGCACTGAG ATCCTTTTGG AAGCTGGTGC TGATC CCAAC GGCAGCCGGC ACCACCGCAG 
CACTCCTGTG TACCATGCCT YTCGTGTGGG TAGGGACGAC ATCCTGAAGG CTCTTATCAG 
GTATGGGGCA GATGTTGATG TC AAC CATC A TCTGAATTCT GACACCCGGC CCCCTTTTTC 
ACGGCGGCTA ACCTCCTTGG TGGTCTGTCC TCTATACATC AGTGCTGCCT ACCATAACCT 
TCAGTGCTTC AGGCTGCTCT TGCAGGCTGG GGCAAATCCT GACTTCAATT GCAATGGCCC 
TGTCAACACC CAGGAGTTCT ACAGGGGATC CCCTGGGTGT GTCATGGATG CTGTCCTGCG 
CCATGGCTGT GAAGCAGCCT TCGTGAGTCT GTTGGTAGAG TTTGGAGCCA AC C TG AACCT 
GGTGAAGTGG GAATCCCTGG GCCCAGAGGC AAGAGGCAGA AGAAAGATGG ATCCTGAGGC 
CTTGCAGGTC TTTAAAGAGG CCAGAAGTAT TCCCAGGACC TTGCTGAGTT TGTGCCGGGT 
GGCTGTGAGA AGAGCTCTTG GCAAATACCG ACTGCATCTG GTTCCCTCGC TGCCGCTGCC 
AGACCCCATA AAGAAGTTTT TGCTTTATGA GTAGCATTCA CATGCAGTGC TGACTGCAAT 
GTGGAAGCCG ATCACCTGCA GTGAAAACTG ACACAGACTC TGGCATCCTG GGAACCATGG 
CCTGTGCTGC CAGCTTGATC CTTGGCTGTC AGTGAAGAAA AAACGGCTGT GTTCTCTTGG 
ACTGTGATTC TATCTCAGGT GCTTGGGCCA TCGAACGCTC CTTGAGTCAT TGTCAACTGA 
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GAGGCACATA CAAACTTAAT TTTGTTCCTC TTCAGTCTCT CTGTTTTGGA TTCTTCCTGG 1320 

CAATGTGTGC AGCATGGGCT GAGCCTGGTG ATTGC CCTAG TGGGGAAGGC TTTTTTCTCC 1380 

AGGCTATGCA TCTATTTATG TTCCTACTTT GCAATTTATT GTTCTTTTAA GGCTTGATAT 1440 

CAAAACAGAA AGAGGTTTGT TAAGAAAAGA TATAGGGAGA AAGGAATTCC GGTTCCGTGC 1500 

ACTTGCTAGC CTGCTTTCCT TGCCTGGGTT TGTCTGTCTA TGCTGCCTGG TGCACATCCC 1560 

TTCTC TTTGC TGCCACTGTT CTATTTTGGG AGTTGTCTTC CGTCTAAGAT GGCTTCTGGG 1620 

GTTCTATCTT ATTGCACAGA GGTCCCAGAA CAGTGTTCAT AGGGCACCAT CTGCTCTGCC 1680 

AAGGGTTTTC TGATGTCTTA CCCTGGGGAT CTTCAGACAG TGGTTACCTT TAGGAGACCC 1740 

ACCTGGAACT AACCATTAAG TGACTGCCCA CATTCAGATC AGGGACCATC TTAATAGTAC 1800 

TCACTGCCAG TCCTCACAAG AGAAGATGAC ACGGGTGCTC TCTTCAGACA CTCCCATACA I860 

GGAAGTTGGA AAATGTCTTG GTCACCTGGG TTGTTCCCAG GCTACAACTT CTTGGTGTTC 1920 

CACTAARACC AGRATATCCT AGTTTTTTGG GTTGACTGTT CCCTCCCCAC TTTCCTTGAA 1980 

NCCCAATGCC CNTTTGTKTN GGTTGCTTCC CTAAAAKTT 2 019 



(2) INFORMATION FOR SEQ ID NO: 25: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 350 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

Ala Axg Gly Gly Val Arg Ala Glu Ala Glu Asp Gin Val Gly Met Ala 
1 5 10 15 
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Glu Gly Gly Thr Gly Pro Asp Gly Arg Ala Gly Pro Gly Pro Ala Gly 
20 25 30 

Pro Asn Leu Lys Glu Trp Leu Arg Glu Gin Phe Cys Asp His Pro Leu 
35 40 45 

Glu His Cys Asp Asp Thr Arg Leu His Asp Ala Ala Tyr Val Gly Asp 
50 55 60 

Leu Gin Thr Leu Arg Asn Leu Leu Gin Glu Glu Ser Tyr Arg Ser Arg 
65 70 75 80 

lie Asn Glu Lys Ser Val Trp Cys Cys Gly Trp Leu Pro Cys Thr Pro 
85 90 95 

Leu Arg lie Ala Ala Thr Ala Gly His Gly Asn Cys Val Asp Phe Leu 
100 105 110 

lie Arg Lys Gly Ala Glu Val Asp Leu Val Asp Val Lys Gly Gin Thr 
115 120 125 

Ala Leu Tyr Val Ala Val Val Asn Gly His Leu Glu Ser Thr Glu He 
130 135 140 

Leu Leu Glu Ala Gly Ala Asp Pro Asn Gly Ser Arg His His Arg Ser 
145 150 155 160 

Thr Pro Val Tyr His Ala Xaa Arg Val Gly Arg Asp Asp He Leu Lys 
165 170 175 

Ala Leu lie Arg Tyr Gly Ala Asp Val Asp Val Asn His His Leu Asn 
180 185 190 

Ser Asp Thr Arg Pro Pro Phe Ser Arg Arg Leu Thr Ser Leu Val Val 
195 200 205 

Cys Pro Leu Tyr He Ser Ala Ala Tyr His Asn Leu Gin Cys Phe Arg 
210 215 220 

Leu Leu Leu Gin Ala Gly Ala Asn Pro Asp Phe Asn Cys Asn Gly Pro 
225 230 235 240 

Val Asn Thr Gin Glu Phe Tyr Arg Gly Ser Pro Gly Cys Val Met Asp 
245 250 255 
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Ala Val Leu Arg His Gly Cys Glu Ala Ala Phe Val Ser Leu Leu Val 
260 265 270 

Glu Phe Gly Ala Asn Leu Asn Leu Val Lys Trp Glu Ser Leu Gly Pro 
275 280 285 

Glu Ala Arg Gly Arg Arg Lys Met Asp Pro Glu Ala Leu Gin Val Phe 
290 295 300 

Lys Glu Ala Arg Ser lie Pro Arg Thr Leu Leu Ser Leu Cys Arg Val 
305 310 315 320 

Ala Val Arg Arg Ala Leu Gly Lys Tyr Arg Leu His Leu Val Pro Ser 
325 330 335 

Leu Pro Leu Pro Asp Pro lie Lys Lys Phe Leu Leu Tyr Glu 
340 345 350 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 419 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

GCATCCATGG CGGAGGGCGG CAGCACGACG GGCGGGCAGG GCCGGGCTCC GCAGGTCGTA 60 

ATCTGAAGGA GTGGCTGAGG GAGCAATTTT GTGATCATCC GCTGGAGCAC TGTGAGGACA 120 

CGAGGCTCCA TGATGCAGCT TACGTCGGGG ACCTCCAGAC CCTCAGGAGC CTATTGCAAG 180 

AGGAGAGCTA CCGGAGCCGC ATCAACGAGA AGTCTGTCTG GTGCTGTGGC TGGCTCCCCT 240 

GCACACCGTT GCGAATCGCG GCC AC TGCAG GCCATGGGAG CTGTGTGGAC TTCCTCATCC 300 

GGAAGGGGGC CGAGGTGGAT CTGGTGGACG TAAAAGGACA GACGGCCCTG TATGTGGCTG 360 
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TGGTGAACGG GCACCTAGAG AGTACCCAGA TCCTTCTCGA AGCTGGCGCG GACCCCAAC 



419 



(2) INFORMATION FOR SEQ ID NO: 27: 



(i) 



SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 595 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

GAGGAAGAAG AAAAGTGGAC CCTGAGGCCT TGCAGGTCTT TAAAGAGGCC AGAAGTGTTC 60 

CCAGAACCTT GCTGTGTCTG TGCCGTGTGG CTGTGAGAAG AGCTCTTGGC AAAACCGGCT 120 

TCATCTGATT CCTTCGCTGC CTCTGCCAGA CCCCATAAAG AAGTTTCTAC TCCATGAGTA 180 

GACTCCAAGT GCTGCGGTTG ATTCCAGTGA GGGAGAAAGT GATCTGCAGG GAGGTGGACA 240 

CCGAGCCCTG AGTGCTGTGC TGCTGCTGGT CTCCTGATGG CTGTTGCTGC AGAAGATGTC 300 

CTCGTAGACT GTCATTGCTC CTCAGGTGCC TGGGCCGCTG AACAGTCCTT GGGTCATTGT 360 

CAGCTGAGAG GCTTATACTA AAGTTATTAT TGTTTTTCCC AAGTTCTCTG TTCTGGATTT 420 

TCAGTTGCAT ATTAATGTAA CGGGCCATGG GGTATGTACA TGTAGGGGCT GAGGTTGGAG 480 

GCCTACTAAT TTCCTGTAGG GAAGACTCCC AGCACTTCTG GAACTGTGCT TCTCTTTATT 540 

TTTCTACTTC TCAATTTGAT GGTTCGATTA AAGCCTTCTA GTATCTCAAT GAAAA 595 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 896 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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<ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 4.. 396 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

CTG ATG TCC GCA ATT CTG AAG GTT GGA CAC CAC TGC TGG CTG CCT GTG 48 
Met Ser Ala He Leu Lys Val Gly His His Cys Trp Leu Pro Val 
15 10 15 

ACA TCC GCT GTC AAT CCC CAA AGG ATG CTG AGG CCA CCA CCA ACC GCT 96 
Thr Ser Ala Val Asn Pro Gin Arg Met Leu Arg Pro Pro Pro Thr Ala 
20 25 30 

GTT TTC AAC TGT GCC GCT TGC TGC TGT CTG TGG GGG CAG ATG CTG ATG 144 
Val Phe Asn Cys Ala Ala Cys Cys Cys Leu Trp Gly Gin Met Leu Met 
35 40 45 

AAT ACA TAC CGT GTA GTT CAG CTT CCT GAG GAG GCC AAG GGC TTG GTG 192 
Asn Thr Tyr Arg Val Val Gin Leu Pro Glu Glu Ala Lys Gly Leu Val 
50 55 60 

CCA CCA GAG ATT CTA CAG AAG TAC CAT GGA TTC TAC TCT TCC CTC TTT 240 
Pro Pro Glu He Leu Gin Lys Tyr His Gly Phe Tyr Ser Ser Leu Phe 
65 70 75 



GCC TTG GTG AGG CAG CCC AGG TCG CTG CAG CAT CTC TGC CGT TGT GCG 
Ala Leu Val Arg Gin Pro Arg Ser Leu Gin His Leu Cys Arg Cys Ala 
80 85 90 95 



288 



CTC CGC AGT CAC CTG GAG GGC TGT CTG CCC CAT GCA CTA CCG CGC CTT 336 
Leu Arg Ser His Leu Glu Gly Cys Leu Pro His Ala Leu Pro Arg Leu 
100 105 no 

CCC CTG CCA CCG CGC ATG CTC CGC TTT CTG CAG CTG GAC TTT GAG GAT 384 
Pro Leu Pro Pro Arg Met Leu Arg Phe Leu Gin Leu Asp Phe Glu Asp 
115 120 125 

CTG CTC TAC TAGGCTTGCT GCCCTGTGAA CAAAGCAGAC CCCACCCCCA 433 
Leu Leu Tyr 
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CCCCAAGGGC ATCTCTCAGC AATGAATGAT GCAAGGCGGT CTGTCTTCAA GTCAGGAGTG 493 

GACGCCTTGA TCCACACTTG AGAGAAGAGG CCAGATCAGC ACCYGGCTGG TAGTGATNGC 553 

AGAGGGCACC TGTGCAGATC TGTGTGCGCA CTGGAAATCT CTAGGCTGAA GGCYAGAGCA 613 

AATGGTGCAR GTGTTAGTCC TTGGGANGAG AGACAGANGG TGAGAAAGCA AGACAGAGGT 673 

GAGAGTGCAC ATGTCAAGTG GTAGATTGCC TTAAAAGAAA GCTAAAAAAA GAAAAAGATT 733 

CGGGCGAACT TCTTTAGGGG TAATGCTGCA GCGTGTTAAA CTGACTGACC AGCGTC CAT A 793 

TCTTTGGACC CTTCCCGGGT GAAAAAGCCC CTTCATCCTC CAGCGCTCCC CAAGGGTGCT 853 

TAGCAATACC GGGTGCTTTT CTGCCGCAAA GTGAGTTACC AAA 896 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 130 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(XX) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

Met Ser Ala lie Leu Lys Val Gly His His Cys Trp Leu Pro Val Thr 
1 5 10 15 

Ser Ala Val Asn Pro Gin Arg Met Leu Arg Pro Pro Pro Thr Ala Val 
20 25 30 

Phe Asn Cys Ala Ala Cys Cys Cys Leu Trp Gly Gin Met Leu Met Asn 
35 40 45 

Thr Tyr Arg Val Val Gin Leu Pro Glu Glu Ala Lys Gly Leu Val Pro 
50 55 60 

Pro Glu lie Leu Gin Lys Tyr His Gly Phe Tyr Ser Ser Leu Phe Ala 
65 70 75 80 

Leu Val Arg Gin Pro Arg Ser Leu Gin His Leu Cys Arg Cys Ala Leu 
85 90 95 
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Arg Ser His Leu Glu Gly Cys Leu Pro His Ala Leu Pro Arg Leu Pro 
100 105 110 



Leu Pro Pro Arg Met Leu Arg Phe Leu Gin Leu Asp Phe Glu Asp Leu 
115 120 125 



Leu Tyr 
130 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 436 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

GTGGGGGCGT CATCATGACC TCCTCTAGGG CTCTGCAACA TGACTCC TGT GGTGCAAATC 60 

AACAAATTGT TCACTGATGA ATCCACAAGG ATCTCTGGGC CTACAACCAG GTCCTGGTCC 120 

ACATGACTGT CGTCTTCGGA GAAGGCACCA CTCGCCCCCG GCAGGTACGG CTGACACCTC 180 

CATGGGAGAA GACGTATCCA GGCAGCAGCT GCGCGGCCCT TCAAGAGGGC ACATCCCGTC 240 

ATCTAAAGGC ACGGTGTACT GAAGGTAGTC CTGAGACATG AGTCCGATTA CTACAGGCAC 300 

GTGTTC CTCC AGGTGGAGGC TCAGGTCCCC GGGTGAGCTG GGGCTGCAGC GGGACTCAGG 360 

GCGCGGC TCT GGCTGCAGGT CTCGCAGCTC CCTGGGCTGT AGCTCCCGCA GATCCTTGCG 420 

C AC AC CGTTG ACTGGT 436 
(2) INFORMATION FOR SEQ ID NO: 31: 
(i) SEQUENCE CHARACTERISTICS: 
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<A) LENGTH : 2180 base pairs 
<B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

TTAATAGTAC CTACATAGTA GAAAATTATA ACTCCACTTT AAAACAATGT TTTCTTTCTA 60 

TTCAAATCAA TTTAAAACTT TTTATAAACA TTAATGTTGC AAGAGAATCC AGTCCATTTA 120 

TGAAAATTAG TTGACAATCA AGTTCACCCA AGAAAATGTT GACTAAGCTA AAGAAATCAC 180 

AGATAAAACA TTTTACCAAA AGGATAGGTA ACACACAAAA AAATGCTATC ACAGGAAGCT 240 

ATGATCATCT AATATTTCTT TAATAATAAT TCTAGTTCCA TAGGTTTTCA TGTTATGCCA 300 

ATTTGTACCC GAGTTTAATT ACAGAAAAGG CAACAATTTC TAAATTGGTG GTATACATTT 360 

CTTTACAATT TTTTAATGTA AGGCCATTTA TTAAAATAGA CAAACTAGAA GATGAAAACG 420 

AAGGCAACAG AAAAATTCAA CTTTTCACAA CCAAAAGAAT TAGCACAACC TTAGAAATAA 480 

TTTAGAAAAA AGTGTTGTTA AAAGATATGT TGCAGATCTC CGTTCCATTA CCCAAGATTA 540 

TGTCAATTCA CGATTCTAAA TAAATCTTTT TAAAGTAAGA GATTAAAAAC TCATCTTCAG 600 

TGTATATGTA AATTCCGTGG TTTTATCACA CAGGTATGTT TATTCAACAC TGCTTTGGAA 660 

ATGGACCATT TAAAAGGACA TGGCAATTTC CATTCTGTTA AGTTTCATTC AACCTTTACT 720 

TAGGGGTTGA TTACCACATG AAATGTGCTT TTAATGCATA AAAATCACAG TGGATTAGCC 780 

AGCAAAAGGG ACTGGGCGGG GGGGGCATTG AGGAGAATTT GATAATTCAC ATTGTGATTA 840 

TTCTGCACAT TGATGAAACA TAATTCACAC CTCTAAAACC TCAAGACTTC CCTTTTTTAA 900 

AGAACCAAAA TAAACCCAAG ACACCTTGCT GACACTTCCC CACCCCTAAA CAAACTGATG 960 

ACTC TTTTAC ACATAAAACT GAAATAGTTA TGGCAGCAAA AGATTTTGAT GGCAATGAAA 1020 
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GTTTGTAAAC TGTATTTCAA TCTCTTGTTC TTATTCCCAA AGTGCAAGAT GCAGGGTTCT 
CAATCTTTCA GTAGTGC TTC TCCTGTAAAT AATCC TTC AT TTTGTTTGGC AAAGGCAGTT 
TCTGAATTAA GTCTATTCTG GTATACTGAC GTATAACAAA ACGACACAGG TACTGCAACG 
AGCGCACCTA TGAACCCCGG AAC AC TGGTT GGCAAGTTCT GACGGAAGTG CAGATTCCAG 
GCAGCGAGAC CTTGAATAAC AAAAAGCTCC CATTTTCAGA GTCC CTGATT GAATGCTCCA 
ATTAGATCAA CTATGGACGT ATGTCC TTCC ACATCGGCTG TTCATAAAAG CTAAACCTAC 
CATTTGAGTG CTCAATTCTA GTGTGAAGTG TTTTACCATG GGAGCGAAAG TCACAGCTTA 
AAAGG TAACG GTCGTCAGAA CTGTCCCGAA CAAGAAAAGA ACCATCTGGC ACGTTTGCTA 
GCTTCCCTTC TGCCTCCCAA CGTGTGATTG GTCCCCAGTA CCATCCTTGC TTTGCAAGTT 
TTTTCAGCTC CTCTGTAAGG CTTGTCACAA CCATGGGACC ACTACTTTGC ACTGAGTCAT 
AAACTCTTGC AACCCCAGGA GCAGAGTTCG GATCAAAATT CAAATGACAG CGCATAACTT 
TCAGCCACGT GGGGCTTTCT GTCCAGTGAG TCCACTGAAA GTTCCCCTTT GGGATTTGGA 
TTATTCCTGC ATTGGAGTAA CCAATGGTGA AGATTGGAGG GACATCCATC GTGAACCCGC 
TCTCCGGGGT TCTGCAACAT GACTCCCGTG GTGCCAATCA AC AAGC C ATT CACCGGACTG 
ATCCACGAAG ATCTCTGGGG CGACAACTAG GTCCTGGTCT ACCTGACTCT CATCCTCGGG 



GAAAGCGCGC CCTCCCACTT GAGGAGGAAC CGCAGAGACT TCCATGGGAG AAGAGCTGTC 
CAGACAATAG CTCCGTGATC CTTCCAAAGG ATACATCCCC TCATCTAAAG GCACAGTATA 
CTGAATGTAG TCCTGAGGCA TAAGTCCAAT AACGACAGGC ACATGTTCAT CCAGGTGAAG 
ATGCAGGTCT CCATTATGAG AAGCCGAGCT CTTCAGTGAA TTGGCTTGCT CCTGGCACGT 
GGTCTCAGAC TGGAGGTCGT 

(2) INFORMATION FOR SEQ ID NO: 32: 
(i) SEQUENCE CHARACTERISTICS: 



1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2180 
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(A) LENGTH : 2649 base pairs 

(B) TYPE: nucleic acid 

(C ) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(i,i) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

GGCACGAGGC TGTGTCCAGC ACACAGAGAG GGCCCGGCCA TCTGCTTTGG TTCAGAGCCC 60 

TGTGTCTGTC TGTCACTTAG ACTCTTCCTC CCGGCTCGCA GCTCACCCTC CATCCTCCTT 120 

ACTGGCTCCA GCATGACTCG CTTCTCTTAT GCAGAGTACT TTGCTCTGTT TCACTCTGGC 180 

TCTGCACCTT CCAGGTCCCC TTCGTCTCCC GAGAACCCAC CGGCCCGCGC ACCCCTGGGT 240 

CTGTTCCAAG GGGTCATGCA GAAGTATAGC AGCAACCTGT TCAAGACCTC CCAGATGGCG 300 

GCTATGGACC CCGTGCTGAA GGCCATCAAG GAAGGGGATG AAGAGGCCTT GAAGATCATG 360 

ATCCAGGATG GGAAGAATCT TGCAGAGCCC AACAAGGAGG GCTGGCTGCC GCTCCACGAG 420 

GCTGCCTACT ATGGCCAGCT GGGCTGCCTG AAAGTCCTGC AGCAAGCCTA CCCAGGGACC 480 

ATTGACCAAC GCACACTGCA GGAAGAGACA GCATTATACC TGGCCACATG CAGAGAACAC 540 

CTGGATTGCC TCCTGTCGC T GCTCCAGGCG GGGGCAGAGC CTGACATCTC TAACAAATCC 60 0 

AGGGAGACTC CACTTTACAA AGCCTGTGAG CGC AAGAACG CGGAGGCGGT GAGGATATTG 660 

GTGCGATACA ACGCAGACGC CAACCACCGC TGTAACAGGG GCTGGACCGC ACTGCACGAG 720 

TCTGTCTCCC GCAATGACCT GGAGGTCATG GAGATCCTAG TGAGTGGCGG GGCCAAGGTG 780 

GAGGCCAAGA ATGTCTACAG CATCACCCCT TTGTTTGTGG CTGCCCAGAG TGGGCAGCTG 840 

GAGGCCCTGA GGTTCCTGGC CAAGCATGGT GCAGACATCA ACACGCAGGC CAGTGACAGT 900 

GCATCAGCCC TCTACGAGGC CAGCAAGAAT GAGCATGAAG ACGTGGTAGA GTTTCTTCTC 960 

TCTCAGGGCG CCGATGCTAA CAAAGCCAAC AAGGACGGCC TGCTCCCCCT GCATGTTGCC 102 0 
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TCCAAGAAGG GCAACTATAG AATAGTGCAG ATGCTGCTGC CTGTGACCAG CCGCACGCGC 1080 

GTGCGCCGTA GCGGCATCAG CCCGCTGCAC CTAGCGGCCG AGCGCAACCA CGACGCGGTG 1140 

CTGGAGGCGC TGCTGGCCGC GCGCTTCGAC GTGAACGCAC CTCTGGCTCC CGAGCGCGCC 1200 

CGCCTCTACG AGGACCGCCG CAGTTCTGCG CTC TACTTCG CTGTGGTCAA CAACAATGTG 1260 

TACGCCACCG AGCTGTTGCT GCTGGCGGGC GCGGACCCCA ACCGCGATGT CATCAGCCCT 1320 

CTGCTCGTGG CCATCCGCCA CGGCTGCCTG CGCACCATGC AGC TGCTGTT GGACCATGGC 1380 

GCCAACATCG ACGCCTACAT CGCCACTCAC CCCACCGCCT TTCCAGCCAC CATCATGTTT 1440 

GCCATGAAGT GCCTGTCGTT AC TC AAGTTC CTTATGGACC TCGGCTGCGA TGGCGAGCCC 1500 

TGCTTCTCCT GCCTGTACGG CAACGGGCCG CACCACCCGC CCCGCGACCT GGCCGCTTCC 1560 

ACGACGCACC CGTGGACGAC AAGGCACCTA GCGTGGTGCA GTTCTGTGAG TTCCTGTCGG 1620 

CCCCGGAAGT GAGCCGCTGG GCGGGACCCA TCATCGATGT CCTCCTGGAC TATGTGGGCA 1680 

ACGTGCAGCT GTGCTCCCGG CTGAAGGAGC ACATCGACAG CTTTGAGGAC TGGGCTGTCA 1740 

TCAAGGAGAA GGCAGAACCT CCGAGACCTC TGGCTCACCT CTGCCGGCTG CGGGTTCGGA 1800 

AGGCCATAGG AAAATACCGG ATAAAACTCC TGG AC AC AC T GCCGCTTCCC GGCAGGCTAA 1860 

TCAGATACTT GAAATATGAG AATACACAGT AACCAGCCTG GAGAGGAGAT GTGGCCTTCA 1920 

GACTGTTTCC GGGACGCCCC AGGTGGCCTG CATCCAGGAC CCCCTGGGGT CAGAACAGGT 1980 

GTGACCTTGC TGGTTCTTTG CTGGAGCTTC AC C C AAAGTG AGAACCTGAT GTGGGGAGTG 2040 

GACGTGGAAC CTCTGCTTTC AC AC TGTC AG CGGATCGCAG ACCCGCTCTG CTTCTGGCCA 2100 

TAGCCAGAGA CCTTCAACCT GGGGCCAGGG GAGAGCTGGT CTGGGCAAGG TGGCCCAGGC 2160 

AGGAATCCTG GCCTTAAGCT GGAGAACTTG TAGGAATCCC TCACTGGACC CTCAGCTTTC 2220 

AGGCTGCGAG GGAGACGCCC AGCCCAAGTA TTTTATTTCC GTGACACAAT AACGTTGTAT 2280 

CAGAAAAAAA AAAAAACATG GGCGCAGCTT ATTCCTTAGT AGGGTATTTA C TTGC ATGCG 2340 

CGCTTAAAGC TACTGGAAAC ATGCGTTCCA CTATGCTTGA GAATCCCCTT GCACTGGTAA 2400 

SUBSTITUTE SHEET (RULE 26) 



09:56:51 




WO 98/20023 PCT/AU97/00729 

- 155 - 

ACGAGAGCCG ACGTGCTTCA AGGTTGGATT TTTGGTTGCC CCTTTGGCGT TCCGCGGGTT 2460 

TGTCCGACGT AATTGACCCC GTGTTTTGTC ACTTTC GAGT GTTCCGACTA TTGGGGGGCT 2 52 0 

TTTGGTTGTC CCCAAAATTG TGGGTGGTGT GCGGACGCCA CGAGAAGTGG TTCATGGGCG 2 580 

ATAATCATTA CTGGAGAATG TAGAGCGGCG GTTTTACGAA TAAATATTTT TTAAGCCGCC 2 64 0 

TTCCCAAAA 2 649 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 495 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: UNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 

CCTCCTGAGA GTTCGCCGGC CCGGGCCCAA TGGGTTGTTC CAAGGGGTCA TGCAGAAATA 60 

CAGCAGCAGC TTGTTCAAGA CCTCCCAGCT GGCGCCTGCG GACCCCTTGA TAAAGGCCAT 120 

CAAGGATGCG ATGAAGAGGC CTTGAAGACC ATGATCAAGG AAGGGAAGAA TCTCGCAGAG 180 

CCCAACAAGG AGGGCTGGCT GCCGCTGCAC GAGGCCGCAT ACTATGGCCA GGTGGGCTGC 240 

CTGAAAGTCC TGCAGCGAGC GTACCCAGGG ACCATCGACC AGCGCACCCT GCAGGAGGAA 300 

ACAGCCGTTT ACTTGGCAAC GTGCAGGGGC CACCTGGACT GTCTCCTGTC ACTGCTCCAA 360 

GCAGGGGCAG AGCGGGACAT CTCCAACAAA TCCCGAGAGA ACCGCTCTAC AAAGCCTGTG 420 

AGCGCAAGAA CGCGGAAGCC GTGAAGATTC TTGGTGCAGC ACAACGCAGA CACCAACAAC 480 

GCTGCAACCG GGCTG 495 
(2) INFORMATION FOR SEQ ID NO: 34: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 709 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
GTGCAGCTCT GCTCGCGGCT GAAGGAACAC ATCGACAGCT TTGAGGACTG GGCCGTCATC 
AAGGAGAAGG CAGAACCTCC AAGACCTCTG GCTCACCTTT GCCGAC TGCG GGTTCGAAAG 
GCCATTGGGA AATACCGTAT AAAACTCCTA GACACCTTGC CGCTCCCAGG CAGGCTGATT 
AGATACCTGA AATACGAGAA CACCCAGTAA CTGGGGCCAC GGGGAGAGAG GAGTAGCCCC 
TCAGACTCTT CTTACTAAGT CTCAGGACGT CGGTGTTCCC AACTCCAAGG GGACCTGGTG 
ACAGACGAGG CTGCAGGCTG CCTCCCTCTC AGCCTGGACA GCTACCAGGA TCTCACTGGG 
TCTCAGGGCC CAGAGCTTTG GCCAGAGCAG AGAACAGAAT GTGTCAAGGA GAAGAATCAT 
TTGTTTACAA ACTGATGAGC AGATCCCAGA CCTTCTCTAC CTTCAGGAAT GGCAGAAACC 
TCTATTCCTG GGGCCAGGGC AGAGCTTGAG GTGTTCTGGG GAAGGTGGTG CTCAGAGCCT 
TCCCTGTGCC CCTCCACTTG TTCTGGAAAA CTCACCACTT G AC TTC AG AG CTTTCTCTCC 
AAAGACTAAG ATGAAGACGT GGCCCAAGGT AGGGGGTAGG GGGAGCCTGG GTCTTGGAGG 
GCTTTGTTAA GTATTAATAT AATAAATGTT ACACATGTGA AAAAAAAAA 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 848 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
709 
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<ii) MOLECULE TYPE: DNA 



(ix) FEATURE; 

(A) NAME /KEY : CDS 

(B) LOCATION: 1.. 624 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

TTG GAG AAG TGT GGT TGG TAT TGG GGG CCA ATG AAT TGG GAA GAT GCA 48 
Leu Glu Lys Cys Gly Trp Tyr Trp Gly Pro Met Asn Trp Glu Asp Ala 
1 5 10 15 

GAG ATG AAG CTG AAA GGG AAA CCA GAT GGT TCT TTC CTG GTA CGA GAC 9 6 

Glu Met Lys Leu Lys Gly Lys Pro Asp Gly Ser Phe Leu Val Arg Asp 



20 



25 



30 



AGT TCT GAT CCT CGT TAC ATC CTG AGO CTC AGT TTC CGA TCA CAG GGT 
Ser Ser Asp Pro Arg Tyr lie Leu Ser Leu Ser Phe Arg Ser Gin Gly 
35 40 45 



144 



ATC ACC CAC CAC ACT AGA ATG GAG CAC TAC AGA GGA ACC TTC AGC CTG 
lie Thr His His Thr Arg Met Glu His Tyr Arg Gly Thr Phe Ser Leu 
50 55 60 



192 



TGG TGT CAT CCC AAG TTT GAG GAC CGC TGT CAA TCT GTT GTA GAG TTT 
Trp Cys His Pro Lys Phe Glu Asp Arg Cys Gin Ser Val Val Glu Phe 
65 70 75 80 



240 



ATT AAG AGA GCC ATT ATG CAC TCC AAG AAT GGA AAG TTT CTC TAT TTC 
lie Lys Arg Ala lie Met His Ser Lys Asn Gly Lys Phe Leu Tyr Phe 
85 90 95 



288 



TTA AGA TCC AGG GTT CCA GGA CTG CCA CCA ACT CCT GTC CAG CTG CTC 
Leu Arg Ser Arg Val Pro Gly Leu Pro Pro Thr Pro Val Gin Leu Leu 
100 105 110 



336 



TAT CCA GTG TCC CGA TTC AGC AAT GTC AAA TCC CTC CAG CAC CTT TGC 
Tyr Pro Val Ser Arg Phe Ser Asn Val Lys Ser Leu Gin His Leu Cys 
115 120 125 



384 



AGA TTC CGG ATA CGA CAG CTC GTC AGG ATA GAT CAC ATC CCA GAT CTC 
Arg Phe Arg lie Arg Gin Leu Val Arg He Asp His He Pro Asp Leu 
130 135 140 



432 
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CCA CTG CCT AAA CCT CTG ATC TCT TAT ATC CGA AAG TTC TAC TAC TAT 480 

Pro Leu Pro Lys Pro Leu lie Ser Tyr lie Arg Lys Phe Tyr Tyr Tyr 

145 150 155 160 



GAT CCT CAG GAA GAG GTA TAC CTG TCT CTA AAG GAA GCG CAG CGT CAG 528 
Asp Pro Gin Glu Glu Val Tyr Leu Ser Leu Lys Glu Ala Gin Arg Gin 
165 170 175 

TTT CCA AAC AGA AGC AAG AGG TGG AAC CCT CCA CGT AGC GAG GGG CTC 576 
Phe Pro Asn Arg Ser Lys Arg Trp Asn Pro Pro Arg Ser Glu Gly Leu 
180 185 190 



CCT GCT GGT CAC CAC CAA GGG CAT TTG GTT GCC AAG CTC CAG CTT TG AAG AAC C A 
631 

Pro Ala Gly His His Gin Gly His Leu Val Ala Lys Leu Gin Leu 

195 200 205 



AATTAAGCTA CCATGAAAAG AAGAGGAAAA GTGAGGGAAC AGGAAGGTTG GGATTCTCTG 
TGCAGAGACT TTGGTTCCCC ACGCAAGCCC TGGGGCTTGG AAGAAGCACA TGACCGTACT 
CTGCGTGGGG CTCCACCTCA CACCCACCCC TGGGCATCTT AGGACTGGAG GGGCTCCTTG 
GAAAACTGGA AG AAG TC TC A ACACTGTTTC TTTTTCA 



(2) INFORMATION FOR SEQ ID NO: 36: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 207 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Leu Glu Lys Cys Gly Trp Tyr Trp Gly Pro Met Asn Trp Glu Asp Ala 
1 5 10 15 

Glu Met Lys Leu Lys Gly Lys Pro Asp Gly Ser Phe Leu Val Arg Asp 
20 25 30 



Ser Ser Asp Pro Arg Tyr lie Leu Ser Leu Ser Phe Arg Ser Gin Gly 
35 40 45 
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lie Thr His His Thr Axg Met Glu His Tyr Arg Gly Thr Phe Ser Leu 
50 55 60 

Trp Cys His Pro Lys Phe Glu Asp Arg Cys Gin Ser Val Val Glu Phe 
65 70 75 80 

lie Lys Arg Ala lie Met His Ser Lys Asn Gly Lys Phe Leu Tyr Phe 
85 90 95 

Leu Arg Ser Arg Val Pro Gly Leu Pro Pro Thr Pro Val Gin Leu Leu 
100 105 110 

Tyr Pro Val Ser Arg Phe Ser Asn Val Lys Ser Leu Gin His Leu Cys 
115 120 125 

Arg Phe Arg He Arg Gin Leu Val Arg lie Asp His He Pro Asp Leu 
130 135 140 

Pro Leu Pro Lys Pro Leu lie Ser Tyr He Arg Lys Phe Tyr Tyr Tyr 
145 150 155 160 

Asp Pro Gin Glu Glu Val Tyr Leu Ser Leu Lys Glu Ala Gin Arg Gin 
165 170 175 

Phe Pro Asn Arg Ser Lys Arg Trp Asn Pro Pro Arg Ser Glu Gly Leu 
180 185 190 

Pro Ala Gly His His Gin Gly His Leu Val Ala Lys Leu Gin Leu 
195 200 205 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 464 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

SUBSTITUTE SHEET (RULE 26) 



. 09:56:51 





WO 98/20023 



PCT/AU97/00729 



- 160- 



GTTCCAAGCC TAACCCATCT TTGTCGTTTG GAAATTCGGG CCAGTCTAAA AGCAGAGCAC 
CTTCACTCTG ACATTTTCAT CCATCAGTTG CCACTTCCCA GAAGTCTGCA GAACTATTTG 
CTCTATGAAG AGGTTTTAAG AATGAATGAG ATTCTAGAAC CAGCAGCTAA TCAGGATGGA 
GAAACCAGCA AGGCCACCTG ACACAGGTCC TTTAATTCTG TTTAGTCACA AAAGACGGCT 
TGTGTGACTG TTTGGATTTG GTGATCAAAT GTCCATGTTT ACAGTTGCTT TTCCCAGTTT 
GTGTCTTTCC CAATATTGTG AACCTTATCC ATCTTGCCTT ACTCAGTTTT ATTTCTAGTG 
CACTTTGTTG TGTATTATTT GTTTACCTGA CCATTTTCTA CTTTATTCTG CTAATAAACT 
GTAATTCTGA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAA 
(2) INFORMATION FOR SEQ ID NO:3 8: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 747 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
GGGGATCGAA AGCGGGGGCT TCTGGGACGC AGCTCTGGAG ACGCGGCCTC GGACCAGCCA 
TTTCGGTGTA GAAGTGGCAG CACGGCAGAC TGGTCAAACA AATGGATTTT ACAGAGGCTT 
ACGCGGACAC GTGCTCTACA GTTGGACTTG CTGCCAGGGA AGGCAATGTT AAAGTCTTAA 
GGAAACTGCT CAAAAAGGGC CGAAGTGTCG ATGTTGC TGA TAACAGGGGA TGGATGCCAA 
TTCATGAAGC AGCTTATCAC AAC TCTGTAG AATGTTTGCA AATGTTAATT AATGCAGATT 
CATC TG AAAA CTACATTAAG ATGAAGACCT TTGAAGGTTT CTGTGCTTTG CATCTCGCTG 
CAAGTCAAGG ACATTGGAAA ATCGTACAGA TTCTTTTAGA AGCTGGGGCA GATCC TAATG 
C AAC TACTTT AGAAGAAACG ACACCATTGT TTTTAGCTGT TGAAAATGGA CAGATAGATG 
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TGTTAAGGCT GTTGCTTCAA CACGGAGCAA ATGTTAATGG ATCCCATTCT ATGTGTGGAT 



540 



GGAACTCCTT GCACCAGGCT TCTTTTCAGG AAAATGCTGA GATCATAAAA TTGCTTCTTA 



600 



GAAAAGGAGC AAACAAGGAA TGCCAGGATG ACTTTGGAAT CACACCTTTA TTTGTGGCTG 



660 



CTCAGTATGG CCAAGCTAGA AAGCTTTGAA GCATACTTAT TTCATCCGGG TGCAAATGTC 



720 



AATTGTCAAG CCTTGGACAA AGCTACC 



747 



<2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1018 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

CACAAATGGG ACCATACAAA AATCTTGGAC TTGTTAATAA CCACTTACTA ACCGGGACCT 60 

GTGACACTGG GCTAAACAAA GTAAGTCCCT GTTTACTCAG CAGTGTTTGG GGGACATGAA 120 

GGATTGCCTA GAAATATTAC TCCGGAATGG TCTACAGCCC AGACGCCCAG GCGTGCCTTG 180 

TTTTTGGATT CAGTTCTCCT GTGTGCATGG CTTTCCAAAA GGAGGTGGAG CTGTAGTTCT 240 

TTGGAATTGT GAACATTCTT TTGAAATATG GAGCCCAGAT AAATGAACTT CATTTGGCAT 300 

ACTGCCTGAA GTACGAGAAG TTTTCGATAT TTCGCTACTT TTTGAGGAAA GGTTGCTCAT 360 

TGGGACCATG GAACCATATA TATGAATTTG TAAATCATGC AATTAAAGCA CAAGCAAAAT 420 

ATAAGGAGTG GTTGCCACAT CTTCTGGTTG CTGGATTTGA CCCACTGATT CTACTGTGCA 480 

ATTCTTGGAT TGACTCAGTC AGCATTGACA CCCTTATCTT CACTTTGGAG TTTACTAATT 540 

GGAAGACACT TGCACCAGCT GTTGAAAGGA TGCTCTCTGC TCGTGCCTCA AACGCTTGGA 600 

TTCTACAGCA ACATATTGCC CACTGTTCCA TCCCTGACCC ATCTTTGTCG TTTGGAAATT 660 
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CGGTCCAGTC T AAAATC AG A ACGTCTACGG TCTGACAGTT ATATTAGTCA GCTGCCACTT 
CCCAGAAGCC TACATAATTA TTTGCTC TAT GAAGACGTTC TGAGGATGTA TGAAGTTCCA 
GAACTGGCAG CTATTCAAGA TGGATAAATC AGTGAAACTA C TTAAC AC AG CTAATTTTTT 
TCTCTGAAAA ATCATCGAGA CAAAAGAGCC ACAGAGTACA AGTTTTTATG ATTTTATAGT 
CAAAAGATGA TTATTGATTG TCAGATAGGT TAGGTTTTGG GGGGCCAGTA GTTCAGTGAG 
AATGTTTATG TTTACAACTA GCCTTCCCAG TAAAAAAAAA AAAAAAAAAA AAAAAAAA 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1897 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
CGGGGGGCTG GGAC CTGGGG CGTAACCGTC TCTACCACGA CGGCAAGAAC CAGCCAAGTA 
AAACATACCC AGCCTTTCTG GAGCCGGACG AGACATTCAT TGTCCCTGAC TCCTTTTTCG 
TGGCCCTGGA CATGRATGAT GGGACCTTAA GTTTCATCGT GGATGGACAG TACATGGGAG 
TGGCTTTCCG GGGACTCAAG GGTAAAAAGC TGTATCCTGT AGTGAGTGCC GTCTGGGGCC 
AC TGTGAG AT CCGCATGCGC TACTTGAACG GACTTGATCC TGAGCCCCTG CCACTCATGG 
AC C TGTGCCG GCGTTCGGTG CGCCTAGCGC TGGGAAAAGA GCGCCTGGGT GCCATCCCCG 
CTCTGCCGCT ACCTGCCTCC CTCAAAGCCT ACCTCCTCTA CCAGTGATCC ACATCCCAGG 
ACCGCCATAC GACAGCCATC TGGTGCCAAR TCACTGAGCC CGTTGGGGTC CGCCGACCCC 
TGCGCCTGGG ATGGAAGCCC ACCTCAGCCA TGGGCAGACG TGCCCCCTCA TCCTACCGGC 
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TGCCTCTGCT GGGGGAACCT ATGCCAACGG ACTTCTCCCT TCCCAACACT GGCTGAAGCA 600 

GCAGCACCCA GGCCCTTCCC TGAACCAGAT GCAGAGAATA AACTATGAAA ACCTCTCTCA 660 

GGCGCCTTCT GCTCTCAGGT GGAGTGGGCT GCCCCCCACT CTCTGCAGAG AGAGGCTACA 720 

CCCACCTGGG GGGTCCTGGG AGGTAAGACT AGTAGGAGGT GCCAGGGCTG ARTCCAAAAG 780 

CAGGAATGGC CAGGAMCAGG CCATACAGAT GAAGCTCAGG ATGTCACATA CCATGGACAM 840 

TGAGACAGAA CCCCAGGTTG GAMTTCCCTT GGGCCAACGA GTGCCAGCTT TAATGTCAGC 90O 

TGCMGGTGCT CTGTGGCCTG TATTTATTCT TTAAACAGTA GCAAAGGCCA TTTATTTATT 960 

CCACTTAGAA AGGAAACCTT GGTGGGTGGY TTCCCTCGAT GTGCTTTCCC CCACCTCCCT 1020 

GGAATGTGTG TGCCACACCT GTCCTTGTCC CAGGCCAGGA CTGTGGCACA TGAGCTGGTG 1080 

TGCACAGATA CACGTATGTC GTCGTGCATG ACCCCTGACT AGTTC CTAAG TAGCCCTGCA 1140 

CCAAGCACCA GAGCAGACCC CAAGAGAGGC CCGTGCAAGT CCCCATGTCC CCAGGTCCCT 1200 

GCTTCTGTTG CCTTGGGACT CATACACCGG CACACGTGTT TCAGCCTCTT GACTTCCATG 1260 

AGCTTCGAAT TTTGCCCCCG ATTCTTCTGA TATTTCCCAT TGGCATCCTC CAAAGCTCTG 1320 

GGCCTGGAGG GCATTAGGAC ACATGGAATG AGTGGGGTCT CCAGCCCCTG GGAAAGCCAC 1380 

TGGCAAGGCA GGATTAGAAA GACCAAGAGC AGGGTGGGGC GCCATGAAGC CTGTATGCCT 1440 

CTCAGGCTCA AGACCCCGCC ACACACCCAC TCAAGCCTCA GAAGTGGTGT GTAGGGCAGC 1500 

CCCAGGAGAG GAATGCCTGT CCTAGCAGCA CGTACATGGA GCACCCCACA TGTGCTCCAG 1560 

CCCTCTGGCT GTTTCTCTTG CTCTAGAATC AACTCCCTAC ATTGGGAATG TAGCCATTTG 1620 

GTAGAGGACT TGCCTAGCCT GCAGGAAGCT CACGTTCCAT CCCCTGCACC AAGGAGAATC 1680 

AAAGCTCAGG AGGCTGAGGC AGGAGGATTG CTGTCAGTGG TGTACAGAGG TCATGGCCAT 1740 

CCTGGGCTAT ATTAAACCTT GTCCTTTAAG AAAAAGAAAA GAAATCAACT TCCATTGAAT 1800 

CTGAGTTCTG CTCATTTCTG CACAGGTACA AT AGATGAC T TKATTTGTTG AAAAATGKTT 1860 

AATATATTTA CMTATATATA TATTTGTAAG AAGCATT 1897 
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(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 134 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 



Gly Gly Trp Asp Leu Gly Arg Asn Arg Leu Tyr His Asp Gly Lys Asn 
15 10 15 

Gin Pro Ser Lys Thr Tyr Pro Ala Phe Leu Glu Pro Asp Glu Thr Phe 
20 25 30 

lie Val Pro Asp Ser Phe Phe Val Ala Leu Asp Met Xaa Asp Gly Thr 
35 40 45 

Leu Ser Phe lie Val Asp Gly Gin Tyr Met Gly Val Ala Phe Arg Gly 
50 55 60 

Leu Lys Gly Lys Lys Leu Tyr Pro Val Val Ser Ala Val Trp Gly His 
65 70 75 80 

Cys Glu lie Arg Met Arg Tyr Leu Asn Gly Leu Asp Pro Glu Pro Leu 
85 90 95 

Pro Leu Met Asp Leu Cys Arg Arg Ser Val Arg Leu Ala Leu Gly Lys 
100 105 no 

Glu Arg Leu Gly Ala lie Pro Ala Leu Pro Leu Pro Ala Ser Leu Lys 
115 120 125 

Ala Tyr Leu Leu Tyr Gin 
130 



(2 ) INFORMATION' FOR SEQ ID NO : 42 : 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 265 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

AAGGGTAAAA AACTGTATCC TGTAGTGAGT GCCGTCTGGG GCCACTGTAG ATCCGAATGC 60 

GCTACTTGAA CGGACTCGAT CCCGAGACTG CCGCTCATGG ATTTGTGCCG TCGCTCGGTG 120 

CGCCTGGCCC TGGGGAGGGA GCGCCTGGGG GAGAACCACA CCTGCCGCTG CCGGCTTCCC 180 

TCAAGGCCTA CCTCCTCTAC CAGTGACGTT CGCCATCATA CCGCCAGCGC GACAGCCACC 240 

TGGTGCCAAC TCACTGAGCC GCCTG 265 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2438 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

AAGTGGCGGC GGTCCCTGGA GAGCAGGCGG AGGCAGCGGC AAGTCTGACT CTGGGCTGAC 60 

CGTGGAGCCG GGGCGGGGGC TGACAGCCAG GCCTCCGCCT GGCGGGAGCC GCACGAGGAG 120 

CGGGAGTGGC CGGGCCTCTC TTCCGCGCTT GAGCGAGCGC CGGGTGATGG CGGTGGTGAT 180 

GGCGGCAGGC GCTCGGACAG CTCCGCTTGA GCTGAGCTCG GAGAGATCCG TCCAGAAAGT 240 
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GCCCAGAAGA AACTTCCTCT TAGAAAAGCT GAAAAACACA RTATTTATAA C AC TGGAAAT 300 

TGTAAAGAAT TTGTTTAAAA TGGCTGAAAA C AATAGT AAA AATGTAGATG TACGGC CTAA 3 60 

AACAAGTCGG AGTCGAAGTG CTGACAGGAA GG ATGGTTAT GTGTGGAGTG GAAAGAAGTT 420 

GTCTTGGTCC AAAAAGAGTG AGAGTTGTTC TGAATCTGAA GCCATAGGTA CTGTTGAGAA 480 

TGTTGAAATT CCTCTAAGAA GCCAAGAAAG GCAGC TTAGC TGTTCGTCCA TTGAGTTGGA 540 

CTTAGATCAT TCCTGTGGGC ATAGATTTTT AGGC CGATCC CTTAAACAGA AACTGCAAGA 600 

TGCGGTGGGG CAGTGTTTTC CAATAAAGAA TTGTAGTGGC CGACACTCTC CAGGGCTTCC 660 

ATCTAAAAGA AAGATTCATA TCAGTGAACT CATGTTAGAT AAGTGCCCTT TCCCACCTCG 720 

CTCAGATTTA GCCTTTAGGT GGCATTTTAT TAAACGACAC ACTGTTCCTA TGAGTCCCAA 780 

CTCAGATGAA TGGGTGAGTG C AG AC C TG TC TGAGAGGAAA CTGAGAGATG CTCAGCTGAA 840 

ACGAAGAAAC ACAGAAGATG AC ATACCC TG TTTCTCACAT ACCAATGGCC AGCCTTGTGT 900 

CATAACTGCC AACAGTGCTT CGTGTACAGG TGGTCACATA ACTGGTTCTA TGATGAACTT 960 

GGTCACAAAC AACAGCATAG AAGACAGTGA CATGGATTCA GAGGATGAAA TTATAACGCT 1020 

GTGCACAAGC TCCAGAAAAA GGAATAAGCC CAGGTGGGAA ATGGAAGAGG AGATCCTGCA 1080 

GTTGGAGGCA CCTCCTAAGT TCCACACCCA GATCGACTAC GTCCACTGCC TTGTTCCAGA 1140 

CCTCCTTCAG ATCAGTAACA ATCCGTGCTA CTGGGGTGTC ATGGACAAAT ATGCAGCCGA 1200 

AGCTCTGCTG GAAGGAAAGC CAGAGGGCAC CTTTTTACTT CGAGATTCAG CGCAGGAAGA 1260 

TTATTTATTC TCTGTTAGTT TTAGACGCTA CAGTCGTTCT CTTCATGCTA GAATTGAGCA 1320 

GTGGAATCAT AACTTTAGCT TTGATGCCCA TGATCCTTGT GTCTTCCATT CTCCTGATAT 1380 

TACTGGGCTC CTGGAACACT ATAAGGACCC CAGTGCCTGT ATGTTCTTTG AGCCGCTCTT 1440 

GTCCACTCCC TTAATCCGGA CGTTCCCCTT TTCCTTGCAG CATATTTGCA GAACGGTTAT 1500 

TTGTAATTGT ACGACTTACG ATGGC ATC G A TGCCCTTCCC ATTCCTTCGC CTATGAAATT 1560 

GTATCTGAAG GAATACCATT ATAAATCAAA AGTTAGGTTA CTCAGGATTG ATGTGCCAGA 1620 
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GCAGCAGTGA TGCGGAGAGG TTAGAATGTC GACCTGCATA CATATTTTCA TTTAATATTT 1680 

TATTTTTCTT ATGCCTCTTT GAATTTTTGT ACAAAGGCAG TTGAATCAAA TAAAACTGTG 1740 

CCCTAAGTTT TAATTCCAGA TCAATTTATT TTTTTTATGA TACACTTGTT ATATATTTTT 1800 

AAGCAGGTGT TTGGTTTTGT TTTTAC CAT A TAAATTTACA TATGGTCCAG GCATATTTAC 1860 

AATTTCAAGG CATTGCATAT ACATTTGAAT ATTCTGTATT TTTTAAATAA TCTTTTGTTC 1920 

TTTCCTATGT GTGAAATATT TTGCTAATCT ATGCTATCAG TATTCTTGTA TGACCGAATA 1980 

GTTACCTATT CTCTTTTCAT CTTGAAGATT TTCAGTAAAG AGTGTTGTAA TCAATCCATT 2040 

ATAATGTAAT TGACTTTTGT AATTTGCCAA TAGGAGTGTT AAACAACAAA ATGATTTAAA 2100 

ATGAAACTTA ATGTATTTTC ATTTTAAATA TTAACTAAAC CAAGTTTGTT TGTTAGTTAT 2160 

TCTAGCCAAT AAGAAAAGAG AATGTAGCAT CCTAGAGGTG TATTTGTTCT GCAGTTTGGC 2220 

AGGACCGTCA GTTAGTCCAA ATAAACATCC CCTCAGCGTG GAGGCGAATG GAACCTGTGC 2280 

TCCTTTCTTA CGGGAAGCTT TGCAAAGCAA AATAGCAGGG TTACAAGCTT GGAGTTGTTA 2340 

AGGCAACTAG AGTTTTCTCT ATTAATTTAT AGACTGTTGT TGCACCTACT TAGCTCTTTT 2400 

TTGGGAACTC TAGTTCCCAG GGGAAAATAC CTCGTGCC 2438 

(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 542 amino acids 
<B) TYPE : amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPEr DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 

Ser Gly Gly Gly Pro Trp Arg Ala Gly Gly Gly Ser Gly Lys Ser Asp 
15 10 15 
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Ser Gly Leu Thr Val Glu Pro Gly Arg Gly Leu Thr Ala Arg Pro Pro 
20 25 30 

Pro Gly Gly Ser Arg Thr Arg Ser Gly Ser Gly Arg Ala Ser Leu Pro 
35 40 45 

Arg Leu Ser Glu Arg Arg Val Met Ala Val Val Met Ala Ala Gly Ala 
50 55 60 

Arg Thr Ala Pro Leu Glu Leu Ser Ser Glu Arg Ser Val Gin Lys Val 
65 70 75 80 

Pro Arg Arg Asn Phe Leu Leu Glu Lys Leu Lys Asn Thr Xaa Phe lie 
85 90 95 

Thr Leu Glu lie Val Lys Asn Leu Phe Lys Met Ala Glu Asn Asn Ser 
100 105 no 

Lys Asn Val Asp Val Arg Pro Lys Thr Ser Arg Ser Arg Ser Ala Asp 
115 120 125 

Arg Lys Asp Gly Tyr Val Trp Ser Gly Lys Lys Leu Ser Trp Ser Lys 
130 135 140 

Lys Ser Glu Ser Cys Ser Glu Ser Glu Ala lie Gly Thr Val Glu Asn 
14 5 150 155 160 

Val Glu lie Pro Leu Arg Ser Gin Glu Arg Gin Leu Ser Cys Ser Ser 
165 170 175 

lie Glu Leu Asp Leu Asp His Ser Cys Gly His Arg Phe Leu Gly Arg 
180 185 190 

Ser Leu Lys Gin Lys Leu Gin Asp Ala Val Gly Gin Cys Phe Pro lie 
195 200 205 

Lys Asn Cys Ser Gly Arg His Ser Pro Gly Leu Pro Ser Lys Arg Lys 
210 215 220 

lie His lie Ser Glu Leu Met Leu Asp Lys Cys Pro Phe Pro Pro Arg 
225 230 235 2 40 

Ser Asp Leu Ala Phe Arg Trp His Phe lie Lys Arg His Thr Val Pro 
245 250 255 
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Met Ser Pro Asn Ser Asp Glu Trp Val Ser Ala Asp Leu Ser Glu Arg 
260 265 270 

Lys Leu Arg Asp Ala Gin Leu Lys Arg Arg Asn Thr Glu Asp Asp lie 
275 280 285 

Pro Cys Phe Ser His Thr Asn Gly Gin Pro Cys Val lie Thr Ala Asn 
290 295 300 

Ser Ala Ser Cys Thr Gly Gly His lie Thr Gly Ser Met Met Asn Leu 
305 310 315 320 

Val Thr Asn Asn Ser lie Glu Asp Ser Asp Met Asp Ser Glu Asp Glu 
325 330 335 

lie lie Thr Leu Cys Thr Ser Ser Arg Lys Arg Asn Lys Pro Arg Trp 
340 345 350 

Glu Met Glu Glu Glu lie Leu Gin Leu Glu Ala Pro Pro Lys Phe His 
355 360 365 

Thr Gin lie Asp Tyr Val His Cys Leu Val Pro Asp Leu Leu Gin lie 
370 375 380 

Ser Asn Asn Pro Cys Tyr Trp Gly Val Met Asp Lys Tyr Ala Ala Glu 
385 390 395 400 

Ala Leu Leu Glu Gly Lys Pro Glu Gly Thr Phe Leu Leu Arg Asp Ser 
405 410 415 

Ala Gin Glu Asp Tyr Leu Phe Ser Val Ser Phe Arg Arg Tyr Ser Arg 
420 425 430 

Ser Leu His Ala Arg lie Glu Gin Trp Asn His Asn Phe Ser Phe Asp 
435 440 445 

Ala His Asp Pro Cys Val Phe His Ser Pro Asp lie Thr Gly Leu Leu 
450 455 460 

Glu His Tyr Lys Asp Pro Ser Ala Cys Met Phe Phe Glu Pro Leu Leu 
465 470 475 480 

Ser Thr Pro Leu He Arg Thr Phe Pro Phe Ser Leu Gin His He Cys 
485 490 495 
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Arg Thr Val lie Cys Asn Cys Thr Thr Tyr Asp Gly lie Asp Ala Leu 
500 505 510 

Pro lie Pro Ser Pro Met Lys Leu Tyr Leu Lys Glu Tyr His Tyr Lys 
515 520 525 

Ser Lys Val Arg Leu Leu Arg lie Asp Val Pro Glu Gin Gin 
530 535 540 



(2*): INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4999 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

CCCTCTGGGC AAGCCGCCCC CCCCCCACCC ATCTACCACA CACACACACA CACACACACA 60 

CACACATTCA GACCTTGGGG CAAAAACAAA GCAAAATAAC AACAACAAAA ACACTGCCTG 120 

TGGAAAGTCC TTACTTCAGG AAGGTTGGCA GATGAGGAGC AAGGGAACAT TTTATCAGGA 180 

CTGCCACAAA GGAGTCTTTT TTTTTAATGG TTTTTCAAGA CAGGGTTTCT CTGTATAGCC 240 

CTGGCTGTCC TGGAGCTCAC TTTGTAGACC AGGCTGGCCT CGAACTCAGA AATTCGCCTG 300 

CCTCTGCCTC CTGAGTGCTG GGATTAAAGG CGTGCAGCAC CATGTCCAAC TGGCATTTTC 3 60 

TCAATTAAGG TTCGTTCCTT TCAGATAACT CTAGGTTCTG GGTCAAGCTG ACACAAGGCT 42 0 

ACACAGCACA GTTTGTATGC CACATTCAGT TCAGAAGACA CCCAACCTCC CTGGAACTGG 480 

AACTTATGCA CATTTGTGAG CTTCCACTTG GGAGTGGGAA CCTGAACTGG GTCCTCTGCA 540 

AGAGCAGCCG TGCTC TTAAC TGCTGAGCCA TTTCAGCAGC CTCACATCAG AATTAAGTTA 600 
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GAAATTAGCCG GGTATGAATC ATACCCTTAG AATCCTAGCA TCTGAAAGCA GAGCTAAGAG 
660 

AAACAGGGAT TCAAGACCAG CTCTTGGCTA CAGAGCCCGT CCTGTCCTAG GATGGGCTAC 720 

AAGAGACTAT TTCAAAGCCA TCCAAACAAC AATAACTACA ACAACAACAA GGTTAAAATT 780 

AGGCTGGGCA CAGGGTACAC ACCTTTAATG CCAACACTCA GGAGGCAGAG GCAGGCTGAT 840 

CAGTGTGAGT TTGAGTTCAA CGTGGTC TAC ATAGGGAGTT CTAGGCCAGC AGAGGTTACA 900 

GTCTCTCTCT CTCTCTCTCT CTCTCTCTCT CTCTCACACA CACACACACA CACACACACA 960 

CACACACACA CACACACGGT GGCATTATGG GATTTTTTTG GGATAAGGTT TCTCTGTCTA 1020 

GCCCTGGCAT AGATTCACTC TGT AG AC TAG GCTAGCCTTG AACTCAGAGA TCCGCCTGCC 1080 

TCTGCCTCCC AAGTGCTGGG ATTATAGGTG TTGCACCACC ACTGCCCAGC CACTTTGGGA 1140 

TTTTTGAACT GTTATCAAGA GGCTTTCGAG GAGGTCAAAC TTCAACAGCA ACCTCTCCAT 1200 

GATAATGTAG CTAATGATCA AACGACACTC AAAACTTAAC CCTTAAAGCA CACATCCACC 1260 

AGACAGCGTG CCCACTCGTA GTTCCATTAC TCAGGAGGCT GAAGCAGGAG GATGAAGGAC 1320 

TAAGGCTTCA GCAACCTAGG GAGCCGCAGG GGACAGTAGT CTCAATCCCT ACATTCTCCT 1380 

GAACACAGGA GCAGGAGTTC AGGAAGGGTG TCAAGGCCGC TTACTGATCT TAGGGCCTCA 1440 

GGAATGACTA GCTCAGGCAG AGAGAGCAAA GGTCTCCAGT GGAGAAGTCT : ACACACACAC 1500 

ACACACACAC ACACACACAC ACACACACAC AGAATCCAAG GCGATGACGT CATCAAAGGG 1560 

TTAATTCTAG TCTGGGATGG GGGGGAGGGT GGGGCACGCA GCTGTCAGGT GGCTTTGGAA 1620 

AAATAAACTG CTGAAGAGTC TGACGCCAGG GAGTCCTGGG AGGGACAAGA GGTTACC CAC 1680 

TCAAAGAGTG TGCTCCACAA AGCATGCGCG CTTGTCCACG TCTGGAGTCG TCACTTATTT 1740 

TTTGCCTGGA TTCTTTGTAG CCGGTGGGTT CTCAAGGCGG TAAGTGGTGT GGCCGCCGTG 1800 

GTCTGGGAGG TGACGATAGG GTTAATCGTC CACAGAGCCC AGGGGCGGAG CGCGGGCGGG 1860 

CGTCCGCAGC CCCGCTGGAG CCGGAAGCAG TGGCTGGTCA GGGGCGCTTC TAGCCTTCCC 1920 
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TATCTGTACT TCCACAGAGG TCTCTGCGAG CTAGGGGGAC AGTGAGGTGC GGGGTAGGGG 1980 

CCCGGCGTTA GAGCCAGCAA GGGGACGGTT CACGGTAAGG TCTGAGGGAG AGAGAGCTCC 2 040 

TGAGAAACTT GGGGGGCGCG ACACAGATAG GGTGAAAGCA GAGTGATAGA CCTGGGATGG 2100 

TTAGGGGACC AAGGGAAGAC CAGGCTGGTT GGCATACACC GGTGAACGGA TGGGAGTCCT 2160 

AGGGAAAGAT GATGCGCCTA AC AGTCC TTT CTGTCTCCAC ACCACTCCAG GGGACGATCC 2220 

. -rGGAGCTCAAC TTTCAAAAGC GAGACGCCCC AGCAAGCCTG TTTTGAGAAG TTCTTCAGCG 2280 

GCTCTCCTCA TGGGCCAGAC GGCCCTCGCA AGGGGCAGCA GCAGCACCCC TACCTCGCAG 2340 

GCTCTGTACT CGGACTTCTC TCCTCCCGAG GGCTTGGAGG AGCTCCTGTC TGCTCCCCCT 2400 

CCTGACCTGG TTGCCCAACG GCACCACGGC TGGAACCCCA AGGATTGCTC CGAGAACATC 2 460 

GATGTCAAGG AAGGGGGTCT GTGCTTTGAG CGGCGCCCTG TGGCCCAGAG CACTGATGGA 2520 

GTCCGGGGGA AACGGGGCTA TTCGAGAGGT CTGCACGCCT GGGAGATCAG CTGGCCCCTG 2580 

GAGCAAAGGG GCACACACGC CGTGGTGGGC GTGGCCACCG CCCTCGCCCC GCTGCAGGCT 2640 

GACCACTATG CGGCGCTTTT GGGCAGCAAC AGCGAGTCCT GGGGCTGGGA TATTGGGCGG 2700 

GGAAAATTGT ATCATCAGAG TAAGGGCCTC GAGGCCCCCC AGTATCCAGC TGGACCTCAG 2760 

GGTGAGCAGC TAGTGGTGCC AGAGAGACTG CTGGTGGTTC TGGACATGGA GG AGGGG AC T 2820 

-CTTGGCTACT CTATTGGGGG CACGTACCTG GGACCAGCCT TCCGTGGACT GAAGGGGAGG 2880 

ACCCTCTATC CCTCTGTAAG TGCTGTTTGG GGCCAGTGCC AGGTCCGCAT CCGCTACATG 2940 

GGCGAAAGAA GAGGTGAGAT ACGGACTAGG TGTGGGGAGA TC AC TACTCT TGGCAATGGT 3000 

TTGGGCTGGA AACTCATGGT TGGAGCACAG GAAGTAGGCT TCTTGTCACT TTGGCCTGTC 3060 

ACTTAGATGG CCTTGGATCT AGCTTCACTC CCAATCCCTA TTGGATGTGA TGCACAAATT 3120 

CAGAGCCTTT GGGTCTCCCT CAGCTGAGGT GGCGGTGGAA ATGGAGGAAG AAGGAAGGGT 3180 

GCCTGAGCAG GATCTCAAGT TCAAGGATGC CTGGAGTTGC TTACTTACCT TGTCTTCCTT 3240 

CTCTCTCCGC AGTGGAGGAA CCACAATCCC TTCTGCACCT GAGCC GCCTG TGTGTGCGCC 3300 
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ATGCTCTGGG GGACACCCGG CTGGGTCAAA TATCCACTCT GCCTTTGCCC CCTGCCATGA 3 360 

AGCGCTATCT GCTCTACAAA TGACCCAGTA GTACAGGGTG TGCTGGCACC CTACCGTGGG 3 420 

GAC AGGTGG A GAGGCACCCG CTGGCCTAGA CAACTTTAAA AAGCTGGTGA AGCTGGGGGG 3 480 

GGGGGGCTGG ACCCCTTCAC CTCCCCTTCT CACAGGAGCA AGACATATAG AAATGATATT 3 540 

AAACACCATG GCAGCCTGGG ACAAAGAGGT TTTTGAAGTA AAAAATGAGA TGTATTGTCA 3 600 

CAACCTGTTT CATTATTGTT TTTTGTTTTG TTTTACACTC CCCCACCCCA GGCTAGAGCC 3 660 

CCATCACTGT CTTAAGGAAT TATGACAACC CACAAAGCTC AGGCCCAGGT GTTTATTTCC 3720 

- CTTAC ATGT A GGATGGTTCA CAAACACAAT ACAGGGGCTT TGGCACCGTG GGGGAGGGGA 3780 

CTATCCCAGG CCTCTTAGGG TC TC ATGT AT AC CGAATTC A GACCCGAAAG CTCTGAATTT 3 840 

CTGCATCAGA CATCCAGTAG AACTTGGGAG TGAAGCTAGA GCCAAGGCCA TCTAAGTGAC 3900 

AGGCCAAAGT GACACGAAGC CCACTTCCTG TGCTCCAACC ATGAGTTTCC AGCCCAAACC 3960 

AATGGAAGGT GATTTCACTT GTCAGGGCCC AAAGGGACAG TCAGTTCTAC TCCCTCCCCT 402 0 

CACTAGGAGC CACCTTGGTG ACAGTTGATT CTACCCACTG TAAGTGGTAA AGGGATTGGC 4080 

CTGGTCCCAA CCATAATAGG GCGGTGGAAA CGGCTCAGGA GGGTACAGCG TGGATTAGGC 4X40 

CACAAGATGG GGCAGATGAT GTCATCAGAA GCATGTGACC GGTGGGAGCA GTTACTAAAC 4200 

TTCTGGGCAA CCTAGTCCAT GCTATGCAGG CAGGTAGAGG GATGGGCAGT GCTCATTGTT 4260 

TGGCATTGAT GATGTCCACA AATTCAGGCT TGAGAGATGC GCCACCCACA AGGAAGCCGT 432 0 

CCACGTCAGG CTGGCTTGCC AGCTCTTTGC AGGTTGCTCC AGTCACAGAA CCTGTACCAG 4380 

GAACAAGAAG ACAGTTTGGT CAGGTCTATG ATCAGAACAC TTAAGCCCCA CCTCTCTGTG 4440 

CAAGGCAGCC TCAGTCTGTC TTAGCCCATT TCCGTCTTAG CTAGAGCCAA AGCCACTCAC 45 00 

CTCCATAAAT GATCCGGGTG CTCTGAGCCA CCCCATCATT GACATTGGAT TTCAGCCATC 4560 

CCCGGAGCTT CTCGTGTACT TCCTGTGCCT AGAAGGAGGA GGCAGAGCTA CTAAGTAAGC 4620 

TCCTTCCTAT CTATCATTCA AGGAGTAAAA ACCACTGGTT CTCACATAGA GTTGAGTTTC 4680 
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CAGAAAAGCC CCGGGACCAG AGAGTGGCAA GGCTCCAATC CCACCAGGCT TGGAATGAAC 
ATTTTTGGCA AAGTCACTCT CCTTGGTGAG TTTGGGGGCC CTCTGTCTCT AAAGGGGCTT 
GGATGGGCTC CATAGCTGTG TGAGTC TGTT AAAGCCGGAC AGGCTGAGGA GCTCTGGGTA 
GTTACCTGCT GAGGGGTTGC CGTCTTGCCA GTCCCAATGG CCCACACAGG TTCATAGGCC 
AGGACCACCT TGCTCCAGTC TTTCACATTA TCTGTGGGGC AGAGAGGAGA GTGAGTAGGA 
AGGAGCTGAC CCGCCAAGC 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 264 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

Met Gly Gin Thr Ala Leu Ala Arg Gly Ser Ser Ser Thr Pro Thr Ser 
1 5 io 15 

Gin Ala Leu Tyr Ser Asp Phe Ser Pro Pro Glu Gly Leu Glu Glu Leu 
20 25 30 

Leu Ser Ala Pro Pro Pro Asp Leu Val Ala Gin Arg His His Gly Trp 
35 40 45 

Asn Pro Lys Asp Cys Ser Glu Asn He Asp Val Lys Glu Gly Gly Leu 
50 55 60 

Cys Phe Glu Arg Arg Pro Val Ala Gin Ser Thr Asp Gly Val Arg Gly 
65 70 75 80 

Lys Arg Gly Tyr Ser Arg Gly Leu His Ala Trp Glu He Ser Trp Pro 



4740 
4800 
4860 
4920 
4980 
4999 



85 



90 



95 



Glu Gin Arg Gly Thr His Ala Val Val Gly Val Ala Thr Ala 
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100 105 HO 

Ala Pro Leu Gin Ala Asp His Tyr Ala Ala Leu Leu Gly Ser Asn Ser 
115 120 125 

Glu Ser Trp Gly Trp Asp lie Gly Arg Gly Lys Leu Tyr His Gin Ser 
130 135 140 

Lys Gly Leu Glu Ala Pro Gin Tyr Pro Ala Gly Pro Gin Gly Glu Gin 
145 150 155 160 

Leu Val Val Pro Glu Arg Leu Leu Val Val Leu Asp Met Glu Glu Gly 
165 170 175 

Thr Leu Gly Tyr Ser lie Gly Gly Thr Tyr Leu Gly Pro Ala Phe Arg 
180 185 190 

Gly Leu Lys Gly Arg Thr Leu Tyr Pro Ser Val Ser Ala Val Trp Gly 
195 200 205 

Gin Cys Gin Val Arg lie Arg Tyr Met Gly Glu Arg Arg Val Glu Glu 
210 215 220 

Pro Gin Ser Leu Leu His Leu Ser Arg Leu Cys Val Arg His Ala Leu 
225 230 235 240 

Gly Asp Thr Arg Leu Gly Gin lie Ser Thr Leu Pro Leu Pro Pro Ala 
245 250 255 

Met Lys Arg Tyr Leu Leu Tyr Lys 
260 



(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH t 5615 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

( D ) TOPOLOGY : linear 

<ii) MOLECULE TYPE : DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 

GTACTTTCTT TATATCTCCA TAATTTTATT TACTATTACT ACATGATACA TTATTTTATA 60 

AAAGTCTTTG TAACCTCCTT AAGGATTCAC TGCTTAATCT CCAGTGCTTA GCACAAATCA 120 

TTAAATGCGA ACCAGAAACT CTTCCAAATG TGTTACATCT ATAACCTCAT TGGATTCTCA 180 

CTACCAACCC CATGCAATAG ATAC TAATGT GATCTCTGTC TTACAGAGGA AGAAACAGGC 240 

AGAGGGAGGT TCAGTAATTT GCCCAAGGTC ATACACACAC TGGCCTTCAG GTATTCATGC 300 

CCGGGGAGTC TGGTCCCACA GCTGGCATGT TTGC C ATT AT ATTATATTGC CTCCTTATAG 360 

TGTCGGCACT CATTAAGCAC ATTGACAGCT ATGCTTGGTG AGTGACTACT ATGTACCCAG 42 0 

CTCTGTGCTA CATGCTTTAC CTGGATTATT TCAACTGCAC AACAACCCTG TGAGGTAACT 480 

ACCATCATTG CTCCTATTTT ACATAACAGA AAACTACAGA AATCTGGGGC TGGGCGTAGT 540 

GGCTCATGCC TGAAATCCCA GCACTTTGGG AGACCCTGTC TCTAAAAAAA ATTTTTTTTT 600 

GGCCGGACGT GGTGGCTCAC ACCTGTAATC TCAGCACTTT GGGAGGCTAA GGCAGGCAGA 660 

TCACAAGGTC AGGAGTTCTA GACCAGCCTG GCCAACATGG CAAAACCCTG TGTCTACTAA 720 

AAATACAAAA AATAGCTAGG CGTGGTGGCA GGTGCCTGTA ATCCCAGCTA CTCAGGAGGC 780 

TGAGGCAGGA GAATCCCCTG AACC TGGGAG ATGGAGGTTA CAGAGAGCCG AGATCGTGCC 840 

GCTGCACTCC AGCCTGGGCA ACAAGAGCAA GACTCTGTCT CGAAAAAAAT AAAAATAAAA 900 

ATAAAAATAT TTTTTTAAAA ATTAGCTGGG TGTGGTAGCA CATGCCTGTA GTCCCAGCTA 960 

CTTGGGAGGC TGAGGTAGGA GGATCACTTG AGCCCAGGAG GTCAAGGCTG CAGTGGGCTG 1020 

TGATGGCGCC ACTGCACTCT AGCCTTGGTG ACAGCAAGAC CCTGTCTCAA AAAAAAAAAA 1080 

AAGAGAAATC GGGCAACTTC CCCAAGATCG CGCAGTTAAC TAGTGGCATA GCTTCACTCA 1140 

AACTCGAAGT CTTAATCAGG ACACTCTACC AAATGAGATC AACGGCTCAG TAATGGATTG 1200 

GCATCCAGTA TGAAGACTGG ACCAGCAGGG AGAACTATGA TGCGTACAGC CTAGAGCCTG 1260 

AAGCAGATTT CACAGCCTCA GAGGTGGCAC AGGCTGACTC ACAACCCGGG GCAGAAAGGG 1320 
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ACCAGCCCAG AAACAGTGAC CCAGAATCAC AGGGAAGTAG AAATGGGATT CGGCACAATG 1380 

AAGCCCCTCC TTGACC CCAT GCTCCTTACC CTCAGGGGCG CAGGAGTTAG TCGCTCAGGC 1440 

GGCTCAAAGG TCTTGACGGT GGAGAACACC ATCCCCAGGG ATTCCCGACG CGGTGATGCC 1500 

ATCAAAGCGT TAATTCTGAG ATGGGCCTGC CCGGGTGCGG ACTCTGCCGC AGCAAGAGAA 1560 

GGGTTAACTG CCCCGGGCCT TCGCCGTGGG GGCGGGGCCT CGGGGAGGGT CACAGCCCGG 162 0 

GACTGAGACC CGAGGTTAAC CGCCCGGGGT "GGGCTCCACG GGGGCGGGGC ATGCTCTCCG 1680 

CGGCTGCTGC CGGTATAGAG CGGTAACTGC CCAGGAGGGG GCGGGGCCCC ACAGGGGCGT 1740 

GGCGTCGGAG CTGCACGGCC GTGGGCGGCG ATGAGAGGGT TAAGCCCCAG AGGGCCCTGG 1800 ^ 

AGGGGCGGGG CCGCGGGACG GGCTCGGCCC AAGGGAGGAG CTGGGGGCGG AAGCGGCCGG 1860 

CGGTCTGCGC CCTGCGCGCC TCGGCTTCTT TCCGCCCGGC TCCTTCAGAG GCCCGGCGAC 192 0 

CTCCAGGGCT GGGAAGTCAA CCGAGGTTCG GGGGCAGCGG CGAGGGCTCC GGGCGAGTAA 1980 

GGGGGATGGT CCATGCTGAG GCCCAAATGG GGCGAACTCG CGAGAGTCTC TGGCGACCTG 2040 

GATCAGATGG GGCGAGGGCA GATGAAGGGC CCAGGAGCTT TGGGGCAGCG AGGAGGGAGG 2100 

AGCGGGCCCG TTGGCAAACT TGGGTGAAAG GATGGGGTAC CTGGGTGACG AGCCCCCGCC 2160 

AGGATTCTGC TCTTCACGCC CCTTTTCTCC CAGCTCCCTT CCAGGTCAAT CCAAACTGGA 2220 

GCTCAACTTT CAGAAGAGAA AGACGCCCCA : GCAAGCCTCT TTCGGGGAGT CCTCTAGCTC 2280 

CTCACCTCCA TGGGCCAGAC AGCTCTGGCA GGGGGCAGCA GCAGCACCCC CACGCCACAG 2340 

GCCCTGTACC CTGACCTCTC CTGTCCCGAG GGCTTGGAAG AGCTGCTGTC TGCACCCCCT 2400 

CCTGACCTGG GGGCCCAGCG GCGCCACGGT TGGAACCCCA AAGACTGTTC AGAGAACATC 2460 

GAGGTCAAGG AAGGAGGGTT GTACTTTGAG CGGCGGCCCG TGGCCCAGAG CACTGATGGG 2520 

GCCCGGGGTA AGAGGGGCTA TTCAAGGGGC CTGCACGCCT GGGAGATCAG CTGGCCCCTA 2580 

GAGCAGAGGG GCACGCATGC CGTGGTGGGC GTGGCCACGG CCCTCGCCCC GCTGCAGACT 2640 

GACCACTACG CGGCGCTGCT GGGCAGCAAC AGCGAGTCGT GGGGCTGGGA CATCGGGCGG 2700 
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GGGAAGCTGT AGCATCAGAG CAAGGGGCCC GGAGCCCCCC AGTATCCAGC GGGAAC TC AG 
GGTGAGCAGC TGGAGGTGCC AGAGAGACTG CTGGTGGTTC TGGACATGGA GGAGGGAACT 
CTGGGCTACG CTATTGGGGG CACCTACCTG GGGCCAGCAT TCCGCGGACT GAAGGGCAGG 
ACCCTCTATC CGGCAGTAAG CGCTGTCTGG GGCCAGTGCC AGGTCCGCAT CCGCTACCTG 
GGCGAAAGGA GAGGTGAGGC CTGGGGCAGA CGTGGGGAGA AC TTTCTGTC CCTGGTGGCA 
GTGGTTTGGG ATGGAAACTC TTCTGACAAG AGCAGAGGGG ATGGACCTTC ATCCAGCCTG 
CCTCAACCTC TGTTCAGTGC TGGGAAAGGC TAGGGGTCTT CACAGCTGTT ATTTAATTTA 
ACCCAACAGC AATAGAGGTG AAACAGGCTT GAGAAAGCAA CTTTCTCAAG TTCTCTTGGC 
CAGTAAATGG TGAACCTTCA GAATGGAGGG AGGAACTGCA GGGATGAGAG AATTCAGGAG 
ATATCAACCC CTGAGCAAGA GGTGCAAAGC GTTAGGTACT GGGTTTGATG TACAGGTCCA 
AAAGAAGGAT GGGCAGAGCC AGGTACCCAG GCTGTATACC GGATTCCCTG GGCTCTAACC 
TGTC TCTGTG CCACATACCT ACTTCCTTCC TCAGCCACAC CTCTGGATGG AGACACTGGG 
GCCCTGGGCA CCAGGGAGGA GAGCAGTGGA GGAGGCAGGG CCTTAGGGTG GGGCAGCAGG 
GGAGGAGCCT CCCCAGGAAC TGACTGGGTC CAGGGCTTGG AGCTGCTCTC TGCAGTTGTG 
TGGGCTGTAG AGTGGAGGGC CATCCCTCCT CACCTCAGCC CCAGCTCCCA AGCCTCTGGA 
GttGAAAGCCT GGGCCAGCTC CACCACTGTC AGAGCCACCT TGGCCTGTTG TTTAGAGGGC 
CTTAGC C AGC TCTTCACCCC CAGCTCTGAC TAGGGATGTG TGAAATCTTA TCTGGGAGGC 
AGAACTTCCG GGTATCTCAA ATTCCCCTTT CAGCCAGGTG GG C AC AC TC G AAGCAGGAAA 
GCAGAAAGGC ATCTGAGTAG GACC CCGTAG TTTGAGGACA TCTGGCTGGT GGCTGCACCC 
ATACTTACAT TCCCCTCCTT CTCTCTCCCA GCGGAGCCAC ACTCCCTTCT GCACCTGAGC 
CGCCTGTGTG TGCGCCACAA CCTGGGGGAT ACCCGGCTCG GCCAGGTGTC TGCCCTGCCC 
TTGCCCCCTG CCATGAAGCG CTACCTGCTC TACCAGTGAG CCC TGTGATA CCACAGACTG 
TGCTGAGGTC TTGCCACCAC CCCTCCCCTT GGGGAGGTGG GGAGGCACTG CTGGCCTAGA 



2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 
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CCAGCTGCTG AAAGCTGGTG AGGCTGAGCC CCTACCCCAA CCCAAGCTCT GCGGAAATCA 4140 

ACAGCCCCAG AGCCAC TTGG AGGGAGGAAG AAAGGGAGCC GGCGTTCAAG GCTATGACAG 4200 

TCTGCTACGC AAAACATTTT TTCAAGTAAA AATAGTAAGA GATGTTGTTA TAGAAACCTG 4260 

TTCTTGTTTT TTTTTTTTTC TTGCACAAAT GATCATTTAT ATAGCTGCCT CAAAAAGGAA 432 0 

GATTATCTGG GCAAGTCCAG TGAAGGCAGA CAAACCACAA GAC CTAGTGC CAGGTTTATT 4380 

CCCTCACATG GGTGGTTCAC ATACACAGCA CAGAGGCACG GGCACCATGG GAGAGGGCAG 4440 

CACTCCTGCC TTCTCAGGGG ATCTTGGCCT CACGGTGTAA GAAGGGAGAG GATGGTTTCT 4500 

CTTCTGCCCT CACTAGGGCC TAGGGAACCC AGGAGCAAAT CCCACCACGC CTTCCATCTC 4560 

TCAGCCAAGG AGAAGC CACC TTGGTGACGT TTAGTTCCAA CCATTATAGT AAGTGGAGAA 4620 

. GGGATTGGCC TGGTCCCAAC CATTACAGGG TGAAGATATA AACAGTAAAG GAAGATACAG 4680 

TTTGGATGAG GCCACAGGAA GGAGCAGATG ACACCATCAG AAGCATATGC AGGGAAAGGG 4740 

CAGTTACTGG GCTTCTGGGC TGCTTAGTCC CTGGCTTGGC AGGAAGGGTA GGGAAGATGG 4800 

ATGGGGCTCA TTGTTTGGCA TTGATGATGT CCACGAATTC GGGCTTGAGG GAAGCACCAC 4860 

CCACAAGGAA GCCATCCACA TCAGGCTGGC TGGCCAGCTC CTTGCAGGTT GCCCCAGTCA 4920 

CAGAGCCTGG GAAGGGAGCA GAACAAGGGC TTGGTCAAGA ATGGGATGAG TCTGCCCCAT 4980 

CCCCACCTCC ATGTC CGAGG GCTCAGTCTA GTCCTCAGCC CACTCCACCT CAGCCGGGAA 5040 

CCAAAGCCAC TCACCTCCAT AAATGATACG GGTGCTCTGA GCCACCGCAT CAGAGACGTT 5100 

GGACTTCAGC CATCCTCGGA GCTTCTCGTG TACTTCCTGG GCCTAGAACA AGAAGCTGGC 5160 

CTAAGTAAGA CCTTTTCTGC CTCTCTAAGA GGAAAAATCA CTGGCACCAG TGGACACTTA 5220 

GTGTGGTTTC TGACTGAGTC AGAGTACCAG GGCTCTGATC CAAGCCAGGC CCTGGACTGG 52 80 

ATGCCCTTGG ACAAGTCACT GTCTCTGGGT TCAAGGTCTC TGTGTCTTTG AAATAAGGGG 5340 

TTGCCCCATG TGGGCTGTGT CTGTCCAAAC CTATTGAGGC AGGCTGGGAT GAGGGCAGGG 54 00 

CTCCTGGGCC CGGTTACCTG TTGGGGTGTT GCAGTCTTGC CAGTACCAAT GGCCCACACA 54 60 
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GGCTC ATAGG CCAGGACGAC CTTGCTCCAG TCCTTCACGT TATCTGCAGG GCAGAGATAC 5520 
AGATGGAGGG AAGGGTGAAC AAGAAAGAGC TCTCCAGCCA GGTTCTCCGG AGTACGAAGA 5580 
ACGGTGGCCT ACTGCCCCCT AGTGGACATT GGGGG 5615 

(2) INFORMATION FOR SEQ ID NO: 48: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 63 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 

Met Gly Gin Thr Ala Leu Ala Gly Gly Ser Ser Ser Thr Pro Thr Pro 
15 10 15 

Gin Ala Leu Tyr Pro Asp Leu Ser Cys Pro Glu Gly Leu Glu Glu Leu 
20 25 30 

Leu Ser Ala Pro Pro Pro Asp Leu Gly Ala Gin Arg Arg His Gly Trp 
35 40 45 

Asn Pro Lys Asp Cys Ser Glu Asn lie Glu Val Lys Glu Gly Gly Leu 
50 55 60 

Tyr Phe Glu Arg Arg Pro Val Ala Gin Ser Thr Asp Gly Ala Arg Gly 
65 70 75 80 

Lys Arg Gly Tyr Ser Arg Gly Leu His Ala Trp Glu lie Ser Trp Pro 
85 90 95 

Leu Glu Gin Arg Gly Thr His Ala Val Val Gly Val Ala Thr Ala Leu 
100 105 no 

Ala Pro Leu Gin Thr Asp His Tyr Ala Ala Leu Leu Gly Ser Asn Ser 
115 120 125 
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. Glu Ser Trp Gly Trp Asp He Gly Arg Gly Lys Leu Tyr His Gin Ser 
130 135 X40 

Lys Gly Pro Gly Ala Pro Gin Tyr Pro Ala Gly Thr Gin Gly Glu Gin 
145 150 155 160 

Leu Glu Val Pro Glu Arg Leu Leu Val Val Leu Asp Met Glu Glu Gly 
165 170 175 

Thr Leu Gly Tyr Ala lie Gly Gly Thr Tyr Leu Gly Pro Ala Phe Arg 
180 185 190 

Gly Leu Lys Gly Arg Thr Leu Tyr Pro Ala Val Ser Ala Val Trp Gly 
195 200 205 

' Gin Cys Gin Val Arg He Arg Tyr Leu Gly Glu Arg Arg Ala Glu Pro 
210 215 220 

His Ser Leu Leu His Leu Ser Arg Leu Cys Val Arg His Asn Leu Gly 
225 230 235 240 

Asp Thr Arg Leu Gly Gin Val Ser Ala Leu Pro Leu Pro Pro Ala Met 
245 250 255 

Lys Arg Tyr Leu Leu Tyr Gin 
260 

(2) INFORMATION FOR SEQ ID NO:49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE i nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
AGCTAGATCT GGACCCTACA ATGGCAGC 28 

<2) INFORMATION FOR SEQ ID NO: 50: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
AGCTAGATCT GCCATCCTAC TCGAGGGGCC AGCTGG 
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CLAIMS: 

1 . . A nucleic acid molecule comprising a sequence of nucleotides encoding or complementary 
to a sequence encoding a protein or a derivative, homologue, analogue or mimetic thereof or a 
nucleotide sequence capable of hybridizing thereto under low stringency conditions at 42°C 
wherein said protein comprises a SOCS box in its C-terminal region. 

2. A nucleic acid molecule according to claim 1 wherein the protein further comprises a 
proteinrmolecule interacting region. 

3. ^ A nucleic acid molecule according to claim 1 wherein the proteinrmolecule interacting 
region is located in a region N-terminal of the SOCS box. 

4. ■> A nucleic acid molecule according to claim 2 or 3 wherein the protein:molecule 
interacting region is a protein :DNA binding region or a proteinrprotein binding region. 

5. A nucleic acid molecule according to claim 4 wherein the proteinrmolecule interacting 
region is one or more of an SH2 domain, WD-40 repeats or ankyrin repeats. 

6. A nucleic acid molecule according to any one of claims 1-5 wherein the SOCS box 
comprises the amino acid sequence: 

X, X 2 X 3 X 4 X 5 X6 X 7 X 8 Xp X 10 X n X 12 X, 3 X 14 X 13 X, 6 VQ n X, 7 X 1S X 19 X20 
X 2 , X^ X^ [Xj] n X 24 Xjj X25 X^Xjg 

wherein: X, is L, I, V, M, A or P; 

X 2 is any amino acid residue; 

X 3 is P, T or S; 

X 4 is L, I, V,M, A orP; 

X 5 is any amino acid; 

X 6 is any amino acid; 
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X 7 is L, I, V, M, A, F, Y or W; 
X 8 is C, T or S; 
Xpis R, K or H; 



X l0 is any amino acid; 

X, , is any amino acid; 

X 12 is L, I, V, M, A or P; 

X 13 is any amino acid; 

X, 4 is any amino acid; 

X J5 is any amino acid; 

X 16 is L, I, V, M, A, P, G, C, T or S; 

[XJ n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X I7 isL,I, V,M, AorP; 
X I8 is any amino acid; 
X IS > is any amino acid; 
X M L, I, V. M, A or P; 
X 21 is P; 

X^is UI, V, M, A,PorG; 
Xjj is P or N; 

P^jln fe a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X 24 is L, I, V, M, A or P; 
X^ is any amino acid; 
X 26 is any amino acid; 
X^ is Y or F; and 
X 28 isL,I> V.M.AorP. 



A nucleic acid molecule according to claim 6 wherein the protein modulates signal 



transduction. 
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8. A nucleic acid molecule according to claim 7 wherein the signal transduction is modulated 
by a cytokine or a hormone, a microbe or a microbial product, a parasite, an antigen or other 
effector molecule. 



9. A nucleic acid molecule according to claim 8 wherein the protein modulates cytokine- 
mediated signal transduction. 

10. A nucleic acid molecule according to claim 9 wherein the signal transduction is mediated 
by one or more of the cytokines EPO, TPO, G-CSF, GM-CSF, IL-3, IL-2, IL-4, IL-7, IL-13, 
IL-6, LIF, IL-12, IFNy, TNFct, IL-1 and/or M-CSF. 

11. A nucleic acid molecule according to claim 10 wherein the signal transduction is mediated 
by one or more of IL-6, LIF, OSM, IFN-y and/or thrombopoietin. 

12. A nucleic acid molecule according to claim 1 1 wherein the signal transduction is mediated 
byBL-6. 

13. A nucleic acid molecule according to claim 1 wherein the nucleotide sequence encodes 
an amino acid sequence substantially as set forth in SEQ ID NO. 4, SEQ ID NO. 6, SEQ ID NO. 
8, SEQ ID NO. 10, SEQ ID NO. 12, SEQ ID NO. 14, SEQ ID NO. 18, SEQ ID NO. 21, SEQ 
ID NO. 25, SEQ ED NO. 29, SEQ ID NO. 36, SEQ ID NO. 41 , SEQ ID NO. 44, SEQ ID NO. 
46 or SEQ ID NO. 48 or an amino acid sequence having at least about 15% similarity to all or 
part of the listed sequences or a nucleotide sequence which hybridizes to the nucleic acid 
molecule under low stringency conditions at 42°C. 

14. A nucleic acid molecule according to claim 1 wherein the nucleotide sequence is 
substantially as set forth in SEQ ID NO. 3, SEQ ID NO. 5, SEQ ID NO. 7, SEQ ID NO. 9, SEQ 
ID NO. 11, SEQ ID NO. 13, SEQ ID NO. 15, SEQ ID NO. 16, SEQ ID NO. 17, SEQ ID NO. 
20, SEQ ID NO. 22, SEQ ID NO. 23, SEQ ID NO. 24, SEQ ID NO. 26, SEQ ID NO. 27, SEQ 
ID NO. 28, SEQ ID NO. 30, SEQ ID NO. 31, SEQ ID NO. 32, SEQ ID NO. 33, SEQ ID NO. 
34, SEQ ID NO. 35, SEQ ID NO. 37, SEQ ID NO. 38, SEQ ID NO. 39, SEQ ID NO. 40, SEQ 
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ID NO. 42, SEQ ID NO. 43, SEQ ID NO. 45 or SEQ ID NO. 47 or a nucleotide sequence 
having at least 15% similarity to all or a part of the listed sequences or a nucleotide sequence 
capable of hybridizing to the listed sequences under low stringency conditions at 42 °C. 

15 . A nucleic acid molecule comprising a sequence of nucleotides encoding or complementary 
to a sequence encoding a protein or a derivative, homologue, analogue or mimetic thereof or a 
nucleotide sequence capable of hybridizing thereto under low stringency conditions at 42 °C 
wherein said protein exhibits the following characteristics: 

(i) comprises a SOCS box in its C-terminal region wherein said SOCS box comprises 
the amino acid sequence: 

X, X, X 3 X 4 X s X 6 X 7 X 8 X, X 10 X„ X K X 13 X 14 X 15 X I6 [XJ. x I7 x„ x 19 X M 

^21 ^22 ^23 PQJn ^24 ^23 ^26 ^27^28 

wherein: X! is L, I, V, M, A or P; 

X 2 is any amino acid residue; 

X 3 is P, T or S; 

X 4 is L, I, V, M, AorP; 

X 5 is any amino acid; 

X$ is any amino acid; 

X 7 is L, I, V, M, A, F, Y or W; 

X 8 is C, T or S; 

is R, K or H; 
X I0 is any amino acid; 
X n is any amino acid; 
X I2 is L, I, V, M, AorP; 
X 13 is any amino acid; 
X l4 is any amino acid; 
X I5 is any amino acid; 
X 16 is L, I, V, M, A, P, G, C, T or S; 
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[XJ D is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X l7 isL,I, V, M, AorP; 

X t8 is any amino acid; 

X, 9 is any amino acid; 

X^L, I, V, M, AorP; 

X^isP; 

X^isL, I, V,M, A,PorG; 
Xjj is P or N; 

[Xj] 0 is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X u is L, I, V, M, A or P; 

X25 is any amino acid; 

X M is any amino acid; 

X^is YorF; 

X M is L, I, V, M ( A or P; and 

(ii) comprises at least one of an SH2 domain, WD-40 repeats and/or ankyrin repeats 
or other protein:molecule interacting domain in a region N-terminal of the SOCS box; 
and 

(iii) modulates signal transduction. 



16. An isolated protein or a derivative, homologue or mimetic thereof comprising a SOCS 
box in its C-terrninal region. 

17. An isolated protein according to claim 16 wherein the protein further comprises a 
protein:molecule interacting region. 

18. An isolated protein according to claim 17 wherein the proteinrmolecule interacting region 
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is located in a region N-terminal of the SOCS box. 

19. An isolated protein according to claim 16 or 17 wherein the proteinrmolecule interacting 
region is a protein:DNA binding region or a proteinrprotein binding region. 

20. An isolated protein according to claim 19 wherein the protein:molecule interacting region 



21. An isolated protein according to any one of claims 16-20 wherein the SOCS box 
comprises the amino acid sequence: 



X, X 2 X 3 X, X 5 X, X 7 X 8 X, X l0 X n X 12 X l3 X 14 X I5 X 16 [XJ n X 17 X 18 X 19 X 



X 2 is any amino acid residue; 

X 3 is P, T or S; 

X^s L, I, V, M, AorP; 

X 3 is any amino acid; 

X 6 is any amino acid; 

X 7 is L, I, V, M, A, F, Y or W; 

X 8 is QTorS; 

X 9 isR,KorH; 

X I0 is any amino acid; 

X u is any amino acid; 

X 12 isL,I, V,M, AorP; 

X I3 is any amino acid; 

X I4 is any amino acid; 

X 15 is any amino acid; 

X J6 is L, I, V, M, A, P, G, C, T or S; 

[XJ n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 



is one or more of an SH2 domain, WD-40 repeats or ankyrin repeats. 



X 2 i X22 X 23 [Xj] n X24 Xjj Xje X-27^2& 



wherein: 



X, is L, I, V, M, AorP; 
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and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X 17 isL,I, V, M, A or P; 

X I8 is any amino acid; 

X 19 is any amino acid; 

X^UI, V t M, AorP; 

X 21 isP; 

X^isUI, V,M, A,PorG; 
X^isPorN; 

[X^ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X w isL,I, V,M, AorP; 
X^ is any amino acid; 

is any amino acid; 
Xn is Y or F; and 
X^isL, I, V, M, A or P. 

22. An isolated protein according to claim 21 wherein the protein modulates signal 
transduction. 



23. An isolated protein according to claim 22 wherein the signal transduction is modulated 
by a cytokine or other endogenous molecule, a hormone, a microbe or a microbial product, a 
parasite, an antigen or other effector molecule. 

24. An isolated protein according to claim 23 wherein the protein modulates cytokine- 
mediated signal transduction. 

25. An isolated protein according to claim 24 wherein the signal transduction is mediated 
by one or more of the cytokines EPO, TPO, G-CSF, GM-CSF, IL-3, IL-2, IL-4, EL-7, EL-13, 
DL-6, LIF, IL-12, IFNy, TNFa, IL-1 and/or M-CSF. 
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26. An isolated protein according to claim 25 wherein the signal transduction is mediated by 
one or more of IL-6, LIF, OSM, IFN-y and/or thrombopoietin. 

27. An isolated protein according to claim 26 wherein the signal transduction is mediated by 
IL-6. 

28. An isolated protein according to claim 16 wherein said protein comprises an amino acid 
sequence substantially as set forth in SEQ ID NO. 4, SEQ ID NO. 6, SEQ ID NO. 8, SEQ ID 
NO. 10, SEQ ID NO. 12, SEQ ID NO. 14, SEQ ID NO. 18, SEQ ID NO. 21, SEQ ID NO. 25, 
SEQ ID NO. 29, SEQ ID NO. 36, SEQ ID NO. 41, SEQ ID NO. 44, SEQ ID NO. 46 or SEQ 
ID NO. 48 or an amino acid sequence having at least about 15% similarity to all or part of the 
listed sequences. 

29. An isolated protein according to claim 16 wherein the said protein is encoded by a 
nucleotide sequence substantially as set forth in SEQ ID NO. 3, SEQ ID NO. 5, SEQ ID NO. 7, 
SEQ ID NO. 9, SEQ ID NO. 1 1, SEQ ID NO. 13, SEQ ID NO. 15, SEQ ID NO. 16, SEQ ID 
NO. 17, SEQ ID NO. 20, SEQ ID NO. 22, SEQ ID NO. 23, SEQ ID NO. 24, SEQ ID NO. 26, 
SEQ ID NO. 27, SEQ ID NO. 28, SEQ ID NO. 30, SEQ ID NO. 31, SEQ ID NO. 32, SEQ ID 
NO. 33, SEQ ID NO. 34, SEQ ID NO. 35, SEQ ID NO. 37, SEQ ID NO. 38, SEQ ID NO. 39, 
SEQ ID NO. 40, SEQ ID NO. 42, SEQ ID NO. 43, SEQ ID NO. 45 or SEQ ID NO. 47 or a 
nucleotide sequence having at least 15% similarity to all or a part of the listed sequences or a 
nucleotide sequence capable of hybridizing to the listed sequences under low stringency 
conditions at 42 °C. 

30. An isolated protein or a derivative, homologue, analogue or mimetic thereof having the 
following characteristics: 

(i) comprises a SOCS box in its C-termkial region wherein said SOCS box comprises 
the amino acid sequence: 

X, x 2 x 3 X, X 5 X, x 7 x 8 x 9 X I0 x u X I2 X 13 X I4 X l5 X 16 [XJ n x 17 x 18 x 19 X M 
X 2 i X22 X M tXj] n X 24 X25 X M X^Xjg 
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wherein: X, is L, I, V, M, A or P; 

X 2 is any amino acid residue; 

X 3 is P,T orS; 

X 4 isL,I, V,M, AorP; 

X 5 is any amino acid; 

X 6 is any amino acid; 

X 7 is L, I, V, M, A, F, Y or W; 

X 8 isC, TorS; 

X 9 isR,KorH; 

X t0 is any amino acid; 

X,, is any amino acid; 

X, 2 isL, I, V, M,AorP; 

X l3 is any amino acid; 

X u is any amino acid; 

X, 3 is any amino acid; 

X I6 is L, I, V, M, A, P, G, C f T or S; 

[XJ a is a sequence of n amino acids wherein n is from 1 to SO amino acids 

and wherein the sequence X, may comprise the same or different amino 

acids selected from any amino acid residue; 

X 17 isL.I, V,M t AorP; 

X 18 is any amino acid; 

X 19 is any amino acid; 

X„UI. V,M, AorP; 

X 2l isP; 

Xjj is U h V, M, A, P or G; 
X^ is P or N; 

[Xj] n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence X j may comprise the same or different amino 
acids selected from any amino acid residue; 
X24 is L f I, V> M» A or P; 
X25 is any amino acid; 
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X 26 is any amino acid; 
X 27 is Y or F; 

Xjg is L, I, V, M t A or P; and 



(ii) comprises at least one of an SH2 domain, WD-40 repeats and/or ankyrin repeats 
or other proteimmolecule interacting domain in a region N-terminal of the SOCS box; 
and 

(iii) modulates signal transduction. 

31. A method of modulating levels of a SOCS protein in a cell said method comprising 
contacting a cell containing a SOCS gene with an effective amount of a modulator of SOCS gene 
expression or SOCS protein activity for a time and under conditions sufficient to modulate levels 
of said SOCS protein. 

32. A method of modulating signal transduction in a cell containing a SOCS gene comprising 
contacting said cell with an effective amount of a modulator of SOCS gene expression or SOCS 
protein activity for a time sufficient to modulate signal transduction. 

33. A method of influencing interaction between cells wherein at least one cell carries a SOCS 
gene, said method comprising contacting the cell carrying the SOCS gene with an effective 
amount of a modulator of SOCS gene expression or SOCS protein activity for a time sufficient 
to modulate signal transduction. 

34. A method according to any one of claims 31-33 wherein signal transduction is mediated 
by a cytokine, a hormone, a microbe or a microbial product, a parasite, an antigen or other 
effector molecule. 

35. A method according to claim 34 wherein the cytokine is one or more of EPO, TPO, G- 



CSF, GM-CSF, IL-3, IL-2, IL-4, IL-7, IL-13, EL-6, LIF, IL-12, IFNy, TNFa, IL-1 and/or M- 



CSF. 
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36. A method according to claim 35 wherein the cytokine is one or more of IL-6, Lflr, OSM, 
IFN-y and/or thrombopoietin . 



37. A method according to claim 36 wherein the cytokine is IL-6. 

38. A method according to any one of claims 31-37 wherein the SOCS gene encodes a 
protein having a SOCS box comprising the amino acid sequence: 

X| X 2 X 3 X 3 Xg X 7 X 8 X9 X 10 X|| X J2 X 13 X U X 13 X| 6 (XJ n X n X, 8 Xj 9 X20 

^21 ^22 ^23 CKDo ^24 ^25 ^26 ^27^28 

v wherein: Xi is L, I, V, M, A or P; 

X 2 is any amino acid residue; 

X 3 isP,TorS; 

X^sL.I, V.M, AorP; 

X 5 is any amino acid; 

X$ is any amino acid; 

X 7 is U I, V, M, A, F, Y or W; 

X g is CTorS; 

X^isR.KorH; 

X 10 is any amino acid; 

X,i is any amino acid; 

X I2 isL,I,V,M,AorP; 

X I3 is any amino acid; 

X, 4 is any amino acid; 

X l5 is any amino acid; 

X l6 is L, I, V, M, A, P, G, C, T or S; 

PCJ n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X 17 is L, I, V, M, Aor P; 
X lg is any amino acid; 
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X I9 is any amino acid; 
XjoL, I, V, M,AorP; 
Xa, isP; 

X^is L, I, V,M,A,PorG; 
Xjj is P or N; 

[Xj] Q is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence X } may comprise the same or different amino 
acids selected from any amino acid residue; 
X w is L, I, V, M, A or P; 
Xjj is any amino acid; 
Xjg is any amino acid; 
X27 is Y or F; and 
Xjgis L, I, V, M, A or P. 



39. A method according to claim 38 wherein the SOCS gene comprises a nucleotide 
sequence selected from SEQ ID NO. 3, SEQ ID NO. 5, SEQ ID NO. 7, SEQ ID NO. 9, SEQ 
ID NO. ll t SEQ ID NO. 13, SEQ ID NO. 15, SEQ ID NO. 16, SEQ ID NO. 17, SEQ ID NO. 
20, SEQ ID NO. 22, SEQ ID NO. 23, SEQ ID NO. 24, SEQ ID NO. 26, SEQ ID NO. 27, SEQ 
ID NO. 28, SEQ ID NO. 30, SEQ ID NO. 31, SEQ ID NO. 32, SEQ ID NO. 33, SEQ ID NO. 
34, SEQ ID NO. 35, SEQ ID NO. 37, SEQ ID NO. 38, SEQ ID NO. 39, SEQ ID NO. 40, SEQ 
ID NO. 42, SEQ ID NO. 43, SEQ ID NO. 45 or SEQ ID NO. 47. 

40. A method according to claim 38 wherein the SOCS gene encodes a protein comprising 
an amino acid sequence substantially as set forth in SEQ ID NO. 4, SEQ ID NO. 6, SEQ ID NO. 
8, SEQ ID NO. 10, SEQ ID NO. 12, SEQ ID NO. 14, SEQ ID NO. 18, SEQ ID NO. 21, SEQ 
ID NO. 25, SEQ ID NO. 29, SEQ ID NO. 36, SEQ ID NO. 41, SEQ ID NO. 44, SEQ ID NO. 
46 or SEQ ID NO. 48. 



SUBSTITUTE SHEET (RULE 26) 



09:56:51 



WO 98/20023 



PCT/AU97/0O729 




I DBS 



I°MX 
IHuiea 



Ml 



FIGURE 1 



SUBSTITUTE .SHEET (RULE 26) 



. 0*36:51 



WO 98/20023 



PCT/AU97/00729 



2/126 




to 



c 
o 
o 




c 
o 
u 



CM 
S 



pes 
IHuiea 

pes 
IHiueg 



o 

O) 



CVI 



I 

o. 

oi 



I I 
Q.Q. 
•OX) 

CM CD 



I 

CO 

oi 



SUBSTITUTE SHEET (Rule 26) 



09:56:51 



WO 98/20023 PCT/AU97/00729 



3/126 



CO 
U 

o 

to 




a. 



< 

m 



SUBSTITUTE SHEET (RULE 26) 



00:50:51 



WO 98/20023 PCT/AU97/00729 



4/126 



-159 cgaggctcaagctccgggcggattctgcgtgccgctctcg 

-120 ctccttggggtctgttggccggcctgtgccacccggacgcccggctcactgcctctgtct 

-60 cccccatcagcgcagccccggacgctatggcccacccctccagctggcccctcgagtagg 

1 MVARNQVAADNAI S PAAEPR 

1 ATGGTAGCACGCAACCAGGTGGCAGCCGACAATGCGATCTCCCCGGCAGCAGAGCCCCGA 

21 RRSEPSSSSSSSSPAAPVRP 

6 1 CGGCGGTCAGAGCCCTCCTCGTCCTCGTCTTCGTCCTCGCCAGCGGCCCCCGTGCGTCCC 

41 RPCPAVPAPAPGDTHFRTFR 

121 CGGCCCTGCCCGGCGGTCCCAGCCCCAGCCCCTGGCGACACTC ACTTCCGCACCTTCCGC 

61 SHSDYRRITRTSALLDACGF 

181 TCCCACTCCGATTACCGGCGCATCACGCGGACCAGCGCGCTCCTGGACGCCTGCGGCTTC 

81 YWGPLSVHGAHERLRAEPVG 

241 TATTGGGG ACCCCTGAGCGTGCACGGGGCGCACGAGCGGCTGCGTGCCGAGCCCGTGGGC 

101 TFLVRDSRQRNCFFALSVKM 

301 ACCTTCTTGGTGCGCGACAGTCGTCAACGGAACTGCTTCTTCGCGCTCAGCGTGAAGATG 

121 ASGPTSIRVHFQAGRFHLDG 

361 GCTTCGGGCCCCACGAGCATCCGCGTGC ACTTCCAGGCCGGCCGCTTCCACTTGGACGGC 

141 SRETFDCLFELLEHYVAA PR 

421 AGCCGCGAGACCTTCGACTGCCTTTTCGAGCTGCTGGAGCAcTACGTGGCGGCGCCGCGC 

161 RM LGA PLRQRRVRPLQELCR 

481 CGCATGTTGGGGGCCCCGCTGCGCCAGCGCCGCGTGCGGCCGCTGCAGGAGCTGTGTCGC 

181 QR IVAAVGRENLAR I PLNPV 

541 CAGCGCATCGTGGCCGCCGTGGGTCGCG AGAACCTGGCGCGCATCCcTCTT AACCCGGTA 

201 LRDYLSSFPFQI* 

601 CTCCGTG ACT ACC TG AGTTCCTTC CCCTTC C AG ATC tgaccggctgccgctgtgccgcag 

661 cattaagtgggggcgccttattatttcttattattaattattattatttttctggaacca 

721 cgtgggagccctccccgcctgggtcggagggagtggttgtggagggtgagatgcct ccca 

7 81 cttctggctggagacctcatcccacctctcaggggtgggggtgctcccctcctggtgctc 

841 cctccgggtcccccctggttgtagcagcttgtgtctggggccaggacctgaattccactc 

901 ctacctctccatgtttacatattcccagtatctttgcacaaaccaggggtcggggagggt 

9 61 ctctggcttcatttttctgctgtgcagaatatcctattttatatttttacagccagttta 

1021 gat aataaa ctttattatgaaaqtttttttttaaaagaaaaaaaaaaaaaaaaaa 



FIG 3B 
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FIG 9(11) 



FIG 9 (III) 



FIG 9 
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FIG 13 A( f ) 


FIG 13A(ii) 
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FIG 13B(ii) 
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FIG 13C(ii) 


FIG 13 D 
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FIG 13E(li) 
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FIG 13F(ii) 
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